16 engines. One SDK. Every platform. From wake word detection to speech synthesis — all running on-device with sub-millisecond latency and zero cloud dependency.
Four principles. No compromise.
Your voice never leaves your device. Ever.
No cloud processing, no data retention, no telemetry. GDPR, HIPAA, and NESA compliant by architecture -- not by policy.
The only voice AI SDK with native Arabic from day one.
Not a translation. Not an afterthought. Arabic, Hindi, and Urdu built into the foundation alongside English.
One SDK. iOS, Android, Linux, Windows, macOS, Raspberry Pi, WASM, MCU.
Rust core with native bindings. Write once, deploy to phones, speakers, embedded devices, and browsers.
ONNX-based models. No vendor lock-in.
Standard ONNX models you own and control. Inspect, fine-tune, or replace -- your models are never trapped in a proprietary format.
Every engine runs on-device. No cloud. No API keys. No per-inference fees. Ship voice features that work offline, everywhere.
Custom keyword detection. Lightweight neural architecture, under 1MB models.
Real-time transcription with Zipformer + RNN-T.
File transcription with Whisper. Split encoder/decoder.
Natural on-device synthesis. Kokoro + Piper backends.
Neural VAD with Silero-compatible architecture.
ECAPA-TDNN embeddings. Biometric-grade accuracy.
Real-time speaker segmentation. Who spoke when.
RNNoise + DeepFilterNet. Crystal-clear audio.
Direct intent extraction. Voice commands to actions.
Clone any voice from samples. Self-service on-device.
Unified voice interaction: Wake, VAD, STT, Intent, TTS.
Mimi neural codec. 1.2-6.0 kbps adaptive bitrate.
Unified token streams with per-modality encryption.
Signal-compliant E2E encryption. X3DH + Double Ratchet.
REST + WebSocket. OpenAI/Deepgram-compatible API.
This runs entirely on your device via WebAssembly. No audio leaves your browser. Zero network requests.
Gulf Arabic in Riyadh. Egyptian in Cairo. Levantine in Beirut. One SDK. 420M+ Arabic speakers. 1.09 billion Arabic, Hindi, and Urdu speakers combined. Zero on-device voice AI competitors.
Full coverage: Wake word, STT, TTS, Intent. The formal register of 420M+ speakers.
Wake word + STT + TTS. Tuned for UAE, Saudi, Bahrain, Kuwait, Qatar, Oman.
100M+ speakers. The most widely understood Arabic dialect worldwide.
Syria, Lebanon, Jordan, Palestine. 30M+ speakers.
420M+ speakers. Zero on-device voice AI competitors.
Banks, telcos, and government agencies across the Gulf need voice AI that processes Arabic on-device. Saj Speak is the only SDK that delivers -- with dialect awareness, sovereign data processing, and NESA compliance built in.
Contact Enterprise SalesInstall the SDK, load a model, process audio. Three steps. Production-ready voice AI in your app.
$ pip install sajspeak
pip install sajspeak
One dependency. Python, Rust, Node.js, Swift, WASM, or C -- pick your binding.
WakeEngine("hey_saj.onnx")
Load a model, set a threshold. Use pre-trained models or train your own via Console.
engine.process(frame)
Runs on-device. No cloud. No API keys. No latency. Ships with your binary.
Built on proven open-source models and inference engines. ONNX Runtime and Rust -- battle-tested technology you can trust.
Continuously improved models with new languages, engines, and platform support shipping regularly. Growing developer community.
Extensive patent portfolio filed at IP Australia covering wake word detection, neural codecs, multimodal sensing, and more.
More engines, better language support, open format, progressive pricing.
| Feature | Saj Speak | Picovoice |
|---|---|---|
| Voice Engines | 16 engines | 8 engines |
| Core Language | Rust (memory-safe) | C (proprietary) |
| Model Format | ONNX (open standard) | Proprietary (.ppn) |
| Arabic Support | Native (MSA + Gulf) | Limited |
| Hindi / Urdu | First-class | None |
| Voice Cloning | Self-service | Not available |
| Neural Speech Codec | 1.2-6.0 kbps | Not available |
| E2E Encryption | Signal-compliant | Not available |
| Free Tier | Wake Word + VAD + 2 models | Non-commercial only |
| Pricing Model | From $29/mo (5 tiers) | Per-device fees |
Start free. Pay only when you ship to production. No per-inference fees.
Your models run on your hardware. No audio-minute charges. No surprise bills.
Prototyping and learning
Solo devs and side projects
Teams shipping to production
Scale deployments and custom models
OEM, fleet, and large-scale
| Feature | Free | Indie | Pro | Business | Enterprise |
|---|---|---|---|---|---|
| Engines | |||||
| Wake Word | ✓ | ✓ | ✓ | ✓ | ✓ |
| VAD | ✓ | ✓ | ✓ | ✓ | ✓ |
| STT (Streaming) | -- | ✓ | ✓ | ✓ | ✓ |
| STT (Batch/Whisper) | -- | ✓ | ✓ | ✓ | ✓ |
| TTS | -- | ✓ | ✓ | ✓ | ✓ |
| Speaker ID | -- | ✓ | ✓ | ✓ | ✓ |
| Diarization | -- | -- | ✓ | ✓ | ✓ |
| Speech-to-Intent | -- | -- | ✓ | ✓ | ✓ |
| Voice Cloning | -- | -- | -- | ✓ | ✓ |
| Models & Training | |||||
| Pre-trained models | 2 | 5 | All | All | All + custom |
| Custom model training | -- | -- | 10/mo | Unlimited | Unlimited |
| Languages | |||||
| English | ✓ | ✓ | ✓ | ✓ | ✓ |
| Arabic (MSA + Gulf) | -- | -- | ✓ | ✓ | ✓ |
| Hindi / Urdu | -- | -- | ✓ | ✓ | ✓ |
| Support | |||||
| Community (Discord) | ✓ | ✓ | ✓ | ✓ | ✓ |
| -- | ✓ | ✓ | ✓ | ✓ | |
| Priority support | -- | -- | ✓ | ✓ | ✓ |
| Dedicated engineer | -- | -- | -- | -- | ✓ |
| SLA guarantee | -- | -- | -- | -- | ✓ |
Cloud voice APIs charge per minute, leak data, and add latency. On-device fixes all three.
Zero audio data leaves the device. No cloud processing. No data retention. GDPR/HIPAA/NESA compliance by architecture.
Sub-millisecond wake word inference vs 200-500ms cloud round-trip. Real-time voice interaction without waiting for the network.
No per-minute charges. No audio-hour billing. Your models run on your hardware. One SDK license, unlimited inference.
Sovereign encrypted communications -- built on the same voice AI engines that power Saj Speak.
E2E encrypted messaging, neural codec voice calls at ultra-low bandwidth, on-device meeting intelligence, and Arabic-first design. The "sovereign Slack" position is unoccupied -- Saj Link fills it.
X3DH + Double Ratchet. E2E for messages, calls, and media.
Crystal-clear voice at 1-6 kbps. 20-30x less bandwidth than Opus.
On-device transcription, translation, and meeting summaries.
Native RTL, dialect awareness, voice messages as rich cards.
Memory-safe Rust core. Open ONNX model format -- no vendor lock-in. Models small enough to embed in firmware. Hundreds of tests, zero failures. Protected by an extensive patent portfolio.
Join 200+ developers building private, intelligent voice experiences that run entirely on-device. No cloud. No latency. No compromise.