why i stopped fine-tuning and started prompting

2025-03-15

for months i was convinced that every ml problem needed a fine-tuned model. custom dataset, hours of training, hyperparameter tuning. the whole ritual.

then i actually sat down and benchmarked a well-crafted prompt against my fine-tuned gpt-3.5 variant on a classification task. the prompt won. not by a lot, but it won — and it took me 20 minutes instead of two days.

fine-tuning still has its place. if you need consistent structured output at scale, or you're working with domain-specific language that base models genuinely struggle with, go for it. but for most tasks i've seen in production, a good system prompt with a few examples gets you 90% of the way there.

the lesson wasn't technical. it was about defaults. i was defaulting to the complex solution because it felt more like "real" ml work. now i default to the simplest thing that could work and only escalate when i have evidence that it's not enough.

simplicity is a feature.