I'll be doing a few finetunes around a dataset I annotated a few years back which is probably an interesting use case for structured data extraction. Some links for context:
https://mlops.systems/posts/2024-03-24-publishing-afghanistan-dataset-huggingface.html is a blog I wrote about the dataset
https://huggingface.co/datasets/strickvl/isafpressreleases is the original dataset.
https://mlops.systems/posts/2024-06-02-isafpr-prompting-baseline.html describes the context of the task for which I want to fine-tune.
https://mlops.systems/posts/2024-06-03-isafpr-evaluating-baseline.html is a blog where I examine the baseline performance of GPT-4-Turbo at extracting entities from the text (as I hope to achieve with finetuning).