Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've noticed that llama 2 + llama.cpp doesn't seem to even use the GPU much. I tried a better gpu (more speed, more memory) and my inference speed didn't increase.


Make sure that you're telling it to use the GPU. How are you launching llama_cpp?


I was using the command line. llama-7b

I did some investigating, and I found it doesn't start using the GPU unless you have a lot of input (such as a long prompt).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: