embeddings are all you need (for most search problems)

i keep running into teams building elaborate keyword search pipelines with elasticsearch, custom tokenizers, synonym dictionaries, and ranking heuristics stacked on top of each other. it works, but it's fragile and expensive to maintain.

last month i replaced one of these pipelines with a vector database and an embedding model. the whole thing is about 40 lines of python. search quality went up, maintenance burden went to near zero.

the trick is choosing the right embedding model. for english text, most of the open-source sentence transformers work fine. for multilingual or domain-specific stuff, you might need to look at specialized models or add a re-ranking step.

the part people get wrong: they treat embeddings as a magic black box. you still need to think about chunking strategy, what metadata to store alongside vectors, and how to handle updates. but the core retrieval problem? embeddings solve it better than anything else i've tried for unstructured text.

not everything needs a vector database though. if your data is structured and your queries are predictable, a regular index is still faster and cheaper. know your data.