Has anyone tried MoE at smaller scales? e.g. a 7B model that's made of a bunch o... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		andai on April 17, 2024 \| parent \| context \| favorite \| on: Mixtral 8x22B Has anyone tried MoE at smaller scales? e.g. a 7B model that's made of a bunch of smaller ones? I guess that would be 8x1B. Or would that make each expert too small to be useful? TinyLlama is 1B and it's almost useful! I guess 8x1B would be Mixture of TinyLLaMAs...

samus on April 17, 2024 | [–]

There is Qwen1.5-MoE-A2.7B, which was made by upcycling the weights of Qwen1.5-1.8B, splitting it and finetuning it.

jasonjmcghee on April 17, 2024 | | [–]

Yes there are many fine tunes on huggingface. Search "8x1B huggingface"

auspiv on April 17, 2024 | [–]

The previous mixtral is 8x7B

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact