Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

30-A3B model gives 13 t/s without GPU (I noticed that token/sec * # of params matches memory bandwidth).


Something like 21 t/s on pure CPU on a mini PC that's <2 years old.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: