Wait. Isn't it a breaking change to change the underlying model like this? Wouldn't people start running into consistency issues in production? (given ollama appears to be oriented towards backend use)
Yeah this is exactly what happens when you ask a base model a question. It'll just attempt to continue what you already wrote based off its training set, so if you say have it continue a story you've written it may wrap up the story and then ask you to subscribe for part 2, followed by a bunch of social media comments with reviews.
Considering "mixtral:8x22b" on ollama was last updated yesterday, and Mixtral-8x22B-Instruct-v0.1 (the topic of this post) was released about 2 hours ago, they are not the same model.
Mixtral-8x22B-v0.1 was released a couple days ago. The "mixtral:8x22b" tag on ollama currently refers to it, so it's what you got when you did "ollama run mixtral:8x22b". It's a base model only capable of text completion, not any other tasks, which is why you got a terrible result when you gave it instructions.
Mixtral-8x22B-Instruct-v0.1 is an instruction-following model based on Mixtral-8x22B-v0.1. It was released two hours ago and it's what this post is about.
(The last updated 44 minutes ago refers to the entire "mixtral" collection.)
Looks like an issue with the quantization that ollama (i.e llama.cpp) uses and not the model itself. It's common knowledge from Mixtral 8x7B that quantizing the MoE gates is pernicious to model perplexity. And yet they continue to do it. :)
Output: https://gist.github.com/IAmStoxe/7fb224225ff13b1902b6d172467...
Within the first paragraph, it outputs:
> GET AN ESSAY WRITTEN FOR YOU FROM AS LOW AS $13/PAGE
Thought that was hilarious.