Hacker Newsnew | past | comments | ask | show | jobs | submit | raajg's commentslogin

I've been dabbling with STT quite a bit and built my own tool using Deepgram. But just tried Handy and it's SO FREAKING FAST! Love it.


Hex is my new favorite STT on MacOS. Also uses Parakeet V3. I didn't think it could possibly be faster than Handy, but it is much faster - even long ramblings transcribed within a second. It's MacOS only, leverages the CoreML / Apple Neural Engine.

https://github.com/kitlangton/Hex

Also the transcriptions with hex don't seem to suffer from some of the issues with Handy, such as stutter.


For local speech-to-text, Whisper remains the gold standard - you can run it locally with good accuracy across languages. For speech-to-speech, you'd typically chain Whisper with a local TTS model like Coqui TTS or use something like Tortoise TTS for higher quality but slower processing. The key is balancing accuracy, speed, and resource usage based on your specific use case. If you're doing content creation workflows, consider what post-processing you might need - sometimes the raw transcription needs structure and enhancement beyond just accurate words.

+1 on the post-processing point. Raw Whisper output is ~90% there but punctuation, grammar, and formatting are the missing piece.

I built MumbleFlow to address exactly this — whisper.cpp for STT plus llama.cpp for smart text cleanup, all running on-device. Metal/CUDA accelerated, sub-second latency on Apple Silicon. Global hotkey works in any app.

$5 one-time, no cloud, no subscription. https://mumble.helix-co.com


Yes especially with Parakeet V3. It’s also nicely hackable, I Clauded a couple PRs to improve the experience, like removing stutters and filler words.


This was recently shared on HN: https://visualrambling.space/dithering-part-1/

For anyone interested in seeing how dithering can be pushed to the limits, play 'Return of the Obra Dinn'. Dithering will always remind you of this game after that.

- https://visualrambling.space/dithering-part-1

- https://store.steampowered.com/app/653530/Return_of_the_Obra...


On Return of the Obra Dinn's dithering specifically, here is the original developer blog on its dithering: https://forums.tigsource.com/index.php?topic=40832.msg136374...

It's intended, aesthetically, to remind you of Atkinson dithering (https://en.wikipedia.org/wiki/Atkinson_dithering), a variant of Floyd-Steinberg dithering often used in graphics for the black-and-white Macintosh.


Time to set up Pi-Hole on my Rapsberry Pi 4

https://pi-hole.net/


I think that local search, retrieval, and filing will become much easier with LLMs.

There are already tools and products in the market that allow you to rename and organize files. I believe this is the future.

We have developed various systems over decades, but I anticipate with LLMs it'll be so easy to file and retrieve things that we won't even have to think about it.


42

Ref: google.com/search?q=answer+to+life+the+universe+and+everything


More than a Roman character, Catalina reminded me of "Howard Roark" from The Fountainhead. It's been at least a decade since I last read it, but I thought the movie was quite influenced by that book.


I've only seen the posters, an early trailer, and a few paragraphs about it here and there, but I've been surprised this comparison hasn't come up more often. Looking at the poster and the plot synopsis, it's all I can think of.


Oh, no, not The Fountainhead.

(The 1949 movie version is amusing today. Roark's architecture is bad early brutalism, now a cliche. The art deco office interiors are great.)


He's said several times that his primary influence in the last decade? of rewrites has been David Graeber and David Wengrow


Been using the LLM cli by simonw and love it.

https://github.com/simonw/llm

https://llm.datasette.io/en/stable/

Pro tip: Use $pbpaste to inject clipboard contents in a prompt


I don't have pbcopy and pbpaste on my machine but injecting clipboard sounds interesting.


I just 'star' the repos


looks promising, but after looking at the website I'm yearning to learn more about it! How does it compare to alternatives? What's the performance like? There isn't enough to push me to stop using ChatGPT and use this instead. Offline is good, but to get users at scale there has to be a compelling reason to shift. I don't think that offline capabilities are going to be enough to get significant number of users.

Another tip, I try out a new chat interface to LLMs almost every week and they're free to use initially. There isn't a compelling reason for me to spend $10 from the get to for a use case that I'm not sure about yet.


The compelling reason to shift to local/decentralized AI is that all of compute will soon be AI and that means your entire existence will go into it. The question you should ask yourself is do you want everything about you being handled by Sam Altman, Google, Microsoft, etc? Do you want all of your compute dependent on them always being up and do you want to trust their security team with your life? Do you want to still be using closed/centralized/hosted AI when truly open AI surpasses all of them in performance and capability. If you have children or family, do you want them putting their entire lives in the hands of those folks.

Decentralized AI will eventually become p2p and swarmed and then the true power of agents and collaboration will soar via AI.

Anyway, excuse the soap box, but there are zero valid reasons for supporting and paying centralized keepers of AI that rarely share, collaborate or give back to the community that made what they have possible.


> when truly open AI surpasses all of them in performance and capability.

Is this true? I've tried llama last year and it was not very helpful. GPT4 is already full of problems and I have to keep circumventing them, so using something less capable doesn't get me too excited.


Maybe this isn't for everyone, just the people who place a high value on privacy.


If your ultimate goal is privacy, then you should only be looking at open source chat UI front ends:

https://github.com/mckaywrigley/chatbot-ui

https://github.com/oobabooga/text-generation-webui

https://github.com/mudler/LocalAI

And then connecting them to off-line models servers:

- Ollama

- llama.cpp

And you should avoid closed source frontends:

- Recurse

- LM Studio

And closed source models

- ChatGPT

- Gemini


Are you implying Claude is an open source model?


I don't think the list was meant to be exhaustive.


But how can I guarantee this app is private?

I'm assuming I cannot block internet access to the app because it needs to verify App Store entitlement.


I mean, ok, then how do you distinguish yourself from LM Studio (Free)


Been 4 months and I'm pretty happy with the following setup: PiHole + RaspberryPi + Tailscale

With Pihole running on a tailnet all my devices use it by default as long as they're on the same tailnet. That way I have seamless ad-blocking even when I'm on cellular data or my friends' wifi networks.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: