I've noticed that llama 2 + llama.cpp doesn't seem to even use the GPU much. I t... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		m463 on Dec 10, 2023 \| parent \| context \| favorite \| on: Ask HN: What's the best hardware to run small/medi... I've noticed that llama 2 + llama.cpp doesn't seem to even use the GPU much. I tried a better gpu (more speed, more memory) and my inference speed didn't increase.

MPSimmons on Dec 11, 2023 [–]

Make sure that you're telling it to use the GPU. How are you launching llama_cpp?

m463 on Dec 12, 2023 | [–]

I was using the command line. llama-7b

I did some investigating, and I found it doesn't start using the GPU unless you have a lot of input (such as a long prompt).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact