Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Has anyone tried MoE at smaller scales? e.g. a 7B model that's made of a bunch of smaller ones? I guess that would be 8x1B.

Or would that make each expert too small to be useful? TinyLlama is 1B and it's almost useful! I guess 8x1B would be Mixture of TinyLLaMAs...



There is Qwen1.5-MoE-A2.7B, which was made by upcycling the weights of Qwen1.5-1.8B, splitting it and finetuning it.


Yes there are many fine tunes on huggingface. Search "8x1B huggingface"


The previous mixtral is 8x7B




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: