macOS already has some great intrinsic TTS capability as the OS seems to include a naturally sounding voice. I recently built a similar tool to just run the "say" command as a background process. Had to wrap it in a Deno server. It works, but with Tahoe it's difficult to consistently configure using that one natural voice, and not the subpar voices downloadable in the settings. The good voice seems to be hidden somehow.
My mistake, seems like I was refering to the Siri voice, which seems to be the default. It sounds good. It is selectable and to my surprise - even configurable in speed, pitch and volume - in the OS Accessibility settings -> System Voice -> Click on the (i) symbol. (macOS Tahoe)
Just made it an MCP server so claude can tell me when it's done with something :)
https://github.com/Marviel/speak_when_done