Speech-to-Text system

A key thing for voice interaction systems.
Now, there are many online services with similar functionality, but most of them lead to security issues (data leaks) and may cause delays in interaction (slow response).
Therefore, there is often a need to use something local.
This solution was written just for such cases.


Written in C, with libs: mpg123, alphacephei framework.
Capable to work on the regular servers, produces fast responses that suitable to build realtime dialogue systems.


Price: 350$ / 350 USDT
For purchase questions, please visit contact page.
A trial period with installation on your servers is provided (preferred Ubuntu 22.04 x64).



Basic features:


--- Examples ---

Example #1 (simple request)

Request:
curl http://127.0.0.1:8801/v1/transcriptions -X POST -H "Authorization: Bearer secret" -H "Content-Type: multipart/form-data" -F language="en" -F smodel="small" -F file="@test.mp3"

Response (json):
{
 "text" : "hello world"
 }
        


Example #2 (with speakes identify)

Request:
curl http://127.0.0.1:8801/v1/transcriptions -X POST -H "Authorization: Bearer secret" -H "Content-Type: multipart/form-data" -F language="en" -F smodel="small" -F vmodel="default" -F file="@test.mp3"

Response (json):
{
 "spk" : [-0.644623, 1.023342, 2.575434, 0.623447, -0.602342, 1.0234234 -1.4824234 -0.021242, 0.824297, -0.152424, ... ],
 "spk_frames" : 81,
 "text" : "hello world"
}