Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's weird. I cap out at 512x512 on my 16GB Ampere card. Even stepping down precision doesn't help. I wonder what's different.

I use it directly from Python.



You might have the n_samples (aka batch size) set to a number greater than 1? That basically multiplies the amount of VRAM you’re using.

I can generate a 512x512 on my 10gb 3080 no problem (or three 384x384 at a time)


Try the hlky fork: https://github.com/hlky/stable-diffusion

It measures memory usage as well.


Perhaps a different kind of chipset that isn’t optimized yet? Just a guess based on other graphics processes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: