Eep. So, on my M1 mac, did `uvx pocket-tts serve`. Plugged in > It was the best ...

Paul_S · 2026-01-16T09:08:25 1768554505

All the models I tried have similar problems. When trying to batch a whole audiobook, the only way is to run it, then run a model to transcribe and check you get the same text.

vvolhejn · 2026-01-16T09:09:58 1768554598

Václav from Kyutai here. Thanks for the bug report! A workaround for now is to chunk the text into smaller parts where the model is more reliable. We already do some chunking in the Python package. There is also a more fancy way to do this chunking in a way that ensures that the stitched-together parts continue well (teacher-forcing), but we haven't implemented that yet.

mgaudet · 2026-01-16T18:32:58 1768588378

Is this just sort of expected for these models? Should users of this expect only truncation or can hallucinated bits happen too?

I also find Javert in particular seems to put in huge gaps and spaces... side effect of the voice?

sbarre · 2026-01-16T03:16:07 1768533367

Yeah Javert mangled up those sentences for me as well, it skipped whole parts and then also moved words around

- "its noisiest superlative insisted on its being received"

Win10 RTX 5070 Ti

small_scombrus · 2026-01-16T04:16:18 1768536978

Using your first text block 'Eponine' skips "we had nothing before us" and doesn't speak the final "that some of its noisiest"

I wonder what's going wrong in there

memming · 2026-01-16T07:37:28 1768549048

interesting; it skipped "we had everything before us," in my test. Yeah, not a good sign.