The models used, without really trying them yet, seem to be much older and much worse compared to seamless-m4t-v2 [1] which is multi-modal and support the tasks of:
Unfortunately, interpreting "CC-BY-NC" as a software license, I think you'd be pirating if you used the linked models for anything you might sell.
(Bergamot is BY-SA, but I think the virality would only apply to derivative models and not model outputs, whereas Facebook's NonCommercial clause might apply to usage of the original model itself, as it usually does in software licenses.)
Speech-to-speech translation (S2ST) Speech-to-text translation (S2TT) Text-to-speech translation (T2ST) Text-to-text translation (T2TT) Automatic speech recognition (ASR).
across
101 languages for speech input. 96 Languages for text input/output. 35 languages for speech output.
I tried it for low resource languages like Thai to German for text and audio, and it works quite well.
1 https://huggingface.co/facebook/seamless-m4t-v2-large