How did you distribute the weights between CPU and GPU? Thanks | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		wensheng on April 5, 2023 \| parent \| context \| favorite \| on: Using mmap to make LLaMA load faster How did you distribute the weights between CPU and GPU? Thanks

adeon on April 5, 2023 [–]

See my response on the sibling comment; I implemented it in a custom Rust implementation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact