A key thing for voice interaction systems.
Now, there are many online services with similar functionality, but most of them lead to security issues (data leaks) and may cause delays in interaction (slow response).
Therefore, there is often a need to use something local.
This solution was written just for such cases.
Written in C, with libs:
mpg123,
alphacephei framework.
Capable to work on the regular servers, produces fast responses that suitable to build realtime dialogue systems.
Price: 350$ / 350 USDT
For purchase questions, please visit contact page.
A trial period with installation on your servers is provided (preferred Ubuntu 22.04 x64).
Doesn't depend on any online services, all data is processed locally
There are open models for various languages
There are tools for it
Allows to defined (in request) a dictionary of accessible words
Allows to generate vector for speaker identification
for example: IBM x3550-M3
Available in dialplan and scripts
There's a module for integration (mod_sivr_asr)
Allows to save memory and improve performance
Easy integration with various applications
- wav
- mp3
- l16
- Linux
Example #1 (simple request)
Request:
curl http://127.0.0.1:8801/v1/transcriptions -X POST -H "Authorization: Bearer secret" -H "Content-Type: multipart/form-data" -F language="en" -F smodel="small" -F file="@test.mp3"
Response (json):
{
"text" : "hello world"
}
Example #2 (with speakes identify)
Request:
curl http://127.0.0.1:8801/v1/transcriptions -X POST -H "Authorization: Bearer secret" -H "Content-Type: multipart/form-data" -F language="en" -F smodel="small" -F vmodel="default" -F file="@test.mp3"
Response (json):
{
"spk" : [-0.644623, 1.023342, 2.575434, 0.623447, -0.602342, 1.0234234 -1.4824234 -0.021242, 0.824297, -0.152424, ... ],
"spk_frames" : 81,
"text" : "hello world"
}