None of the 3B and 7B models are at ChatGPT’s level, let alone GPT-4. The 13B mo...

nullsense · on May 6, 2023

LLaVA 13B is a great multimodal model that has first class support in oobabooga too.

It's really fun to enable both the whisper extension and the TTS extension and have two-way voice chats with your computer while being able to send it pictures as well. Truly mind bending.

Quantized 30B models run at acceptable speeds on decent hardware and are pretty capable. It's my understanding that the open source community is iterating extremely fast on small model sizes getting the most out of them by pushing the data quality higher and higher, and then they plan to scale up to at least 30B parameter models.

I really can't wait to see the results of that process. In the end you're going to have a 30B model that's totally uncensored and is a mix of Wizard + Vicuna. It's going to be a veryyyy capable model.

stavros · on May 6, 2023

I usually even prefer GPT-3.5, as it's faster and much cheaper. GPT-4 is great for the hardcore logical reasoning, but when I want something that knows to turn my lights on and turn the TV to a channel, it's overkill.

Semaphor · on May 6, 2023

> The llama.cpp 7B and 13B models can be run on CPU if you have enough RAM.

Bigger ones as well, you just have to wait longer. Nothing for real time usage, but if you can wait 10-20 minutes, you can use them on CPU.

int_19h · on May 6, 2023

It's not even that bad. Core i7-12700K with DDR5 gives me ~1 word per second on llama-30b - that is fast enough for real-time chat, with some patience. And things are even better on M1/M2 Macs.

Joeri · on May 6, 2023

The critical factor seems to be the ability to fit the whole model in RAM (--mlock option in oobabooga). With Apple's RAM prices most M1/M2 owners probably don't have the 32 GB RAM required to fit a 4bit 30B model.

Semaphor · on May 6, 2023

I have 64 GB RAM, but only a Ryzen 5 3600, and the larger models are very slow ;)

azinman2 · on May 6, 2023

Do these red pajama models work with llama.cpp?

anentropic · on May 6, 2023

the naming is confusing... these models are aiming to equal or beat LLaMa by reproducing the trainign data and methodology that was used for LLaMa

But the actual model architecture is slightly different, based on Pythia

I guess what is needed is a pythia.cpp https://github.com/ggerganov/llama.cpp/issues/742

anentropic · on May 10, 2023

Update: https://github.com/togethercomputer/redpajama.cpp

Joeri · on May 6, 2023

No, llama.cpp only works with llama-based models, like base llama, alpaca, vicuna, ...