16 engines. One SDK. Every platform. From wake word detection to speech synthesis — all running on-device with sub-millisecond latency and zero cloud dependency.
Every engine runs on-device. No cloud. No API keys. No per-inference fees. Ship voice features that work offline, everywhere.
Custom keyword detection. DS-CNN architecture, 20K params.
Real-time transcription with Zipformer + RNN-T.
File transcription with Whisper. Split encoder/decoder.
Natural on-device synthesis. Kokoro + Piper backends.
Neural VAD with Silero-compatible architecture.
ECAPA-TDNN embeddings. Biometric-grade accuracy.
Real-time speaker segmentation. Who spoke when.
RNNoise + DeepFilterNet. Crystal-clear audio.
Direct intent extraction. Voice commands to actions.
Clone any voice from samples. Self-service on-device.
Unified voice interaction: Wake, VAD, STT, Intent, TTS.
Mimi neural codec. 1.2-6.0 kbps adaptive bitrate.
Unified token streams with per-modality encryption.
Signal-compliant E2E encryption. X3DH + Double Ratchet.
REST + WebSocket. OpenAI/Deepgram-compatible API.
This runs entirely on your device via WebAssembly. No audio leaves your browser. Zero network requests.
1.09 billion Arabic, Hindi, and Urdu speakers. Zero on-device voice AI competitors. Not an afterthought -- built into the foundation.
Full coverage: Wake word, STT, TTS, Intent. The formal register of 420M+ speakers.
Wake word + STT + TTS. Tuned for UAE, Saudi, Bahrain, Kuwait, Qatar, Oman.
100M+ speakers. The most widely understood Arabic dialect worldwide.
Syria, Lebanon, Jordan, Palestine. 30M+ speakers.
Install the SDK, load a model, process audio. Three steps. Production-ready voice AI in your app.
$ pip install sajspeak
pip install sajspeak
One dependency. Python, Rust, Node.js, Swift, WASM, or C -- pick your binding.
WakeEngine("hey_saj.onnx")
Load a model, set a threshold. Use pre-trained models or train your own via Console.
engine.process(frame)
Runs on-device. No cloud. No API keys. No latency. Ships with your binary.
More engines, better language support, open format, progressive pricing.
| Feature | Saj Speak | Picovoice |
|---|---|---|
| Voice Engines | 16 engines | 8 engines |
| Core Language | Rust (memory-safe) | C (proprietary) |
| Model Format | ONNX (open standard) | Proprietary (.ppn) |
| Arabic Support | Native (MSA + Gulf) | Limited |
| Hindi / Urdu | First-class | None |
| Voice Cloning | Self-service | Not available |
| Neural Speech Codec | 1.2-6.0 kbps | Not available |
| E2E Encryption | Signal-compliant | Not available |
| Free Tier | Wake Word + VAD + 2 models | Non-commercial only |
| Pricing Model | From $29/mo (5 tiers) | Per-device fees |
Start free. Pay only when you ship to production. No per-inference fees.
Your models run on your hardware. No audio-minute charges. No surprise bills.
Prototyping and learning
Solo devs and side projects
Teams shipping to production
Scale deployments and custom models
OEM, fleet, and large-scale
| Feature | Free | Indie | Pro | Business | Enterprise |
|---|---|---|---|---|---|
| Engines | |||||
| Wake Word | ✓ | ✓ | ✓ | ✓ | ✓ |
| VAD | ✓ | ✓ | ✓ | ✓ | ✓ |
| STT (Streaming) | -- | ✓ | ✓ | ✓ | ✓ |
| STT (Batch/Whisper) | -- | ✓ | ✓ | ✓ | ✓ |
| TTS | -- | ✓ | ✓ | ✓ | ✓ |
| Speaker ID | -- | ✓ | ✓ | ✓ | ✓ |
| Diarization | -- | -- | ✓ | ✓ | ✓ |
| Speech-to-Intent | -- | -- | ✓ | ✓ | ✓ |
| Voice Cloning | -- | -- | -- | ✓ | ✓ |
| Models & Training | |||||
| Pre-trained models | 2 | 5 | 77 | 77 | All + custom |
| Custom model training | -- | -- | 10/mo | Unlimited | Unlimited |
| Languages | |||||
| English | ✓ | ✓ | ✓ | ✓ | ✓ |
| Arabic (MSA + Gulf) | -- | -- | ✓ | ✓ | ✓ |
| Hindi / Urdu | -- | -- | ✓ | ✓ | ✓ |
| Support | |||||
| Community (Discord) | ✓ | ✓ | ✓ | ✓ | ✓ |
| -- | ✓ | ✓ | ✓ | ✓ | |
| Priority support | -- | -- | ✓ | ✓ | ✓ |
| Dedicated engineer | -- | -- | -- | -- | ✓ |
| SLA guarantee | -- | -- | -- | -- | ✓ |
Cloud voice APIs charge per minute, leak data, and add latency. On-device fixes all three.
Zero audio data leaves the device. No cloud processing. No data retention. GDPR/HIPAA/NESA compliance by architecture.
0.27ms wake word inference vs 200-500ms cloud round-trip. Real-time voice interaction without waiting for the network.
No per-minute charges. No audio-hour billing. Your models run on your hardware. One SDK license, unlimited inference.
Memory-safe Rust core. Open ONNX model format -- no vendor lock-in. Models small enough to embed in firmware. 476 tests, zero failures. 15 patents filed at IP Australia.
Join developers building private, intelligent voice experiences that run entirely on-device. No cloud. No latency. No compromise.