That would be cool for this website, but don't try it in real life. If you win, it is almost guaranteed that you will share your prize with many other people.
When the U.K. lottery launched you picked 6 from 49.
The prizes were shared amongst everyone
If the prize pot was say £10m, 4 winners would get £2.5m each.
I suspect that the hunebr of people choosing 123456 would be far higher than choosing 6 random numbers. Thus in the event your numbers did come up, you would get far less winning than someone who had the same odds but chose say 13 22 32 35 40 42
How much computing power would one need to get this working completely local running a half decent llm fine tuned to sound like santa with all tts, stt and the pipecat inbetween?
I started looking into this with a Pi 5. It seemed like it was not quite performant enough. But I'm not an expert with these things and maybe someone else could make it work. We definitely have the technology to pull this off in this form factor. It would just be really expensive (maybe $500) and might also get a little hot.
If I was building it to be 'local only' I would run the inference on a remote host in my house.
Having a microcontroller in the phone is nice because it is WAY less likely to break. I love being able to flash a simple firmware/change things would fighting it too much.
Oh! Also I do all the 'WebRTC/AI dev' in the browser. When I get it working how I like, then do I switch over to doing the microcontroller stuff.
This repo is one possible starting point for tinkering with local agents on macOS. I've got versions of this for NVIDIA platforms but I tend to gravitate to using LLMs that are too big to fit on most NVIDIA consumer cards.
That's not true. You could run such an LLM on a lower end laptop GPU, or a phone GPU. Very low power and low space. This isn't 2023 anymore, a Santa-specific LLM would not be so intensive.
I've done a fair amount of fine-tuning for conversational voice use cases. Smaller models can do a really good job on a few things: routing to bigger models, constrained scenarios (think ordering food items from a specific and known menu), and focused tool use.
But medium-sized and small models never hit that sweet spot between open-ended conversation and reasonably on-the-rails responsiveness to what the user has just said. We don't know yet know how to build models <100B parameters that do that, yet. Seems pretty clear that we'll get there, given the pace of improvement. But we're not there yet.
Now maybe you could argue that a kid is going to be happy with a model that you train to be relatively limited and predictable. And given that kids will talk for hours to a stuffie that doesn't talk back at all, on some level this is a fair point! But you can also argue the other side: kids are the very best open-ended conversationalists in the world. They'll take a conversation anywhere! So giving them an 8B parameter, 4-bit quantized Santa would be a shame.
I had worked on such a tool by extending pyqtgraph's flowcharts. Mainly executing the nodes when their input changes and showing a glowing box around the one running at the moment. Flowcharts supports multiple terminals already and I had added some custom nodes. Node execution must be delegated to a background thread so the gui is not blocked while the node executes. Doubleclicking a node shows a window for setting up the node. A custom class of nodes shows upon doubleclicking a simple python editor, upon execution the terminals get automatically the values of the variables carrying the same name as the terminals. It has been a lot of fun. I had used petl and pyspark for lazy loading so one could build large etl workflows.