Hex is my new favorite STT on MacOS. Also uses Parakeet V3. I didn't think it could possibly be faster than Handy, but it is much faster - even long ramblings transcribed within a second. It's MacOS only, leverages the CoreML / Apple Neural Engine.
For local speech-to-text, Whisper remains the gold standard - you can run it locally with good accuracy across languages. For speech-to-speech, you'd typically chain Whisper with a local TTS model like Coqui TTS or use something like Tortoise TTS for higher quality but slower processing. The key is balancing accuracy, speed, and resource usage based on your specific use case. If you're doing content creation workflows, consider what post-processing you might need - sometimes the raw transcription needs structure and enhancement beyond just accurate words.
+1 on the post-processing point. Raw Whisper output is ~90% there but punctuation, grammar, and formatting are the missing piece.
I built MumbleFlow to address exactly this — whisper.cpp for STT plus llama.cpp for smart text cleanup, all running on-device. Metal/CUDA accelerated, sub-second latency on Apple Silicon. Global hotkey works in any app.
For anyone interested in seeing how dithering can be pushed to the limits, play 'Return of the Obra Dinn'. Dithering will always remind you of this game after that.
It's intended, aesthetically, to remind you of Atkinson dithering (https://en.wikipedia.org/wiki/Atkinson_dithering), a variant of Floyd-Steinberg dithering often used in graphics for the black-and-white Macintosh.
I think that local search, retrieval, and filing will become much easier with LLMs.
There are already tools and products in the market that allow you to rename and organize files. I believe this is the future.
We have developed various systems over decades, but I anticipate with LLMs it'll be so easy to file and retrieve things that we won't even have to think about it.
More than a Roman character, Catalina reminded me of "Howard Roark" from The Fountainhead. It's been at least a decade since I last read it, but I thought the movie was quite influenced by that book.
I've only seen the posters, an early trailer, and a few paragraphs about it here and there, but I've been surprised this comparison hasn't come up more often. Looking at the poster and the plot synopsis, it's all I can think of.
looks promising, but after looking at the website I'm yearning to learn more about it! How does it compare to alternatives? What's the performance like? There isn't enough to push me to stop using ChatGPT and use this instead. Offline is good, but to get users at scale there has to be a compelling reason to shift. I don't think that offline capabilities are going to be enough to get significant number of users.
Another tip, I try out a new chat interface to LLMs almost every week and they're free to use initially. There isn't a compelling reason for me to spend $10 from the get to for a use case that I'm not sure about yet.
The compelling reason to shift to local/decentralized AI is that all of compute will soon be AI and that means your entire existence will go into it. The question you should ask yourself is do you want everything about you being handled by Sam Altman, Google, Microsoft, etc? Do you want all of your compute dependent on them always being up and do you want to trust their security team with your life? Do you want to still be using closed/centralized/hosted AI when truly open AI surpasses all of them in performance and capability. If you have children or family, do you want them putting their entire lives in the hands of those folks.
Decentralized AI will eventually become p2p and swarmed and then the true power of agents and collaboration will soar via AI.
Anyway, excuse the soap box, but there are zero valid reasons for supporting and paying centralized keepers of AI that rarely share, collaborate or give back to the community that made what they have possible.
> when truly open AI surpasses all of them in performance and capability.
Is this true?
I've tried llama last year and it was not very helpful.
GPT4 is already full of problems and I have to keep circumventing them, so using something less capable doesn't get me too excited.
Been 4 months and I'm pretty happy with the following setup: PiHole + RaspberryPi + Tailscale
With Pihole running on a tailnet all my devices use it by default as long as they're on the same tailnet. That way I have seamless ad-blocking even when I'm on cellular data or my friends' wifi networks.