From my understanding, embedding models are just one layer of a modern LLM that do the similarity part as described in the post. The translation of representing content as a part of a common vocabulary in a vector. My understanding is that embedding models can be fine tuned in isolation to relate a vocabulary entries to each other, so fine tuning a whole LLM is not required, and the required compute resources would be far smaller.