Similar to the speech-to-text system (see sttd).
This thing was written for cases when there is a need to have everything locally and so that it doesn’t slow down.
Written in C, with libs:
lame,
speex DSP,
espeak,
onnx,
piper.
Capable to work on the regular servers, produces fast responses that suitable to build realtime dialogue systems.
Price: 350$ / 350 USDT
For purchase questions, please visit contact page.
A trial period with installation on your servers is provided (preferred Ubuntu 22.04 x64).
Doesn't depend on any online services, all data is processed locally
There are open models for various languages
You don't need to purchase or rent some expensive hardware
Available in dialplan and scripts
There's a module for integration (mod_sivr_tts)
Allows to save memory and improve performance
Easy integration with various applications
- wav
- mp3
- Linux
Request:
curl -q http://127.0.0.1:8802/v1/speech -X POST -H "Authorization: Bearer secret" -H "Content-Type: application/json; charset=utf-8" -d '{"language":"en","samplerate":8000,"foramt":"mp3","input":"Hello, how can I help you?"}'
The response will be as an mp3 stream that you can save or payback.