@cstromblad Not yet - there's been no GPTQ 4-bit version. Seems they might be appearing now though so will find some time to test it out.
Could CPU offload bigger models but need the chatbot to be fast enough to not be too boring for the kids' attention span ;)