On-Device Voice AI SDK

Build Voice AI
That Never Leaves
the Device

16 engines. One SDK. Every platform. From wake word detection to speech synthesis — all running on-device with sub-millisecond latency and zero cloud dependency.

310+ tests passing 15 patents filed Open ONNX format
from saj_speak import WakeEngine
 
engine = WakeEngine("hey_saj.onnx")
engine.threshold = 0.9
 
for frame in microphone.stream():
    if engine.process(frame):
        print("Wake word detected!")
use saj_speak::WakeEngine;
 
let engine = WakeEngine::from_model("hey_saj.onnx")?;
engine.set_threshold(0.9);
 
// Process audio frames
if engine.process(&audio_frame)? {
    println!("Wake word detected!");
}
import { WakeEngine } from '@saj-speak/wasm';
 
const engine = await WakeEngine.load('hey_saj.onnx');
engine.threshold = 0.9;
 
engine.onDetection(() => {
    console.log('Wake word detected!');
});
engine.start();
import SajSpeak
 
let engine = try WakeEngine(model: "hey_saj.onnx")
engine.threshold = 0.9
 
engine.onDetection { result in
    print("Detected: \(result.confidence)")
}
try engine.start()
<1ms
Inference Latency
16
Voice Engines
77
Trained Models
107KB
Wake Word Model
4
Languages (EN, AR, HI, UR)
Complete Platform

16 Voice Engines. One SDK.

Every engine runs on-device. No cloud. No API keys. No per-inference fees. Ship voice features that work offline, everywhere.

saj-wake

Wake Word

Custom keyword detection. DS-CNN architecture, 20K params.

0.27ms/inference | EN AR HI UR
saj-listen

Streaming STT

Real-time transcription with Zipformer + RNN-T.

Real-time | EN AR HI UR
saj-scribe

Batch STT

File transcription with Whisper. Split encoder/decoder.

Whisper | 100+ languages
saj-speak-tts

Text-to-Speech

Natural on-device synthesis. Kokoro + Piper backends.

Voice cloning | EN AR HI
saj-detect

Voice Activity

Neural VAD with Silero-compatible architecture.

<1ms latency | Universal
saj-voice

Speaker ID

ECAPA-TDNN embeddings. Biometric-grade accuracy.

192-dim embed | Universal
saj-who

Diarization

Real-time speaker segmentation. Who spoke when.

Real-time | Universal
saj-clean

Noise Suppression

RNNoise + DeepFilterNet. Crystal-clear audio.

303KB model | Universal
saj-intent

Speech Intent

Direct intent extraction. Voice commands to actions.

Skip the text | EN AR
saj-clone

Voice Cloning

Clone any voice from samples. Self-service on-device.

Self-service | EN AR
saj-pipeline

Pipeline

Unified voice interaction: Wake, VAD, STT, Intent, TTS.

End-to-end | All langs
saj-codec

Neural Codec

Mimi neural codec. 1.2-6.0 kbps adaptive bitrate.

5.2x real-time | Universal
saj-sense

Multimodal Sensing

Unified token streams with per-modality encryption.

6 patents filed | Universal
saj-link

Encrypted Comms

Signal-compliant E2E encryption. X3DH + Double Ratchet.

184 tests | Universal
saj-server

Self-Hosted Server

REST + WebSocket. OpenAI/Deepgram-compatible API.

Multi-tenant | Docker
Interactive Demo

Try Wake Word Detection in Your Browser

This runs entirely on your device via WebAssembly. No audio leaves your browser. Zero network requests.

COMING SOON
Model: hey_saj.onnx ya_saj.onnx hey_bella.onnx Language: EN AR
Status: Listening... |
DETECTION LOG Model: hey_saj_v7.onnx (107KB)
14:23:01.003 "Hey Saj" detected conf: 0.97 0.8ms
14:23:04.221 "Hey Saj" detected conf: 0.94 0.9ms
14:23:07.889 "Hey Saj" detected conf: 0.96 0.7ms
Params: 20K Avg latency: 0.8ms Audio processed: 12.4s Detections: 3
Zero audio data transmitted. Runs via WebAssembly on your CPU.
Native Arabic Support

أول منصة صوت ذكي تدعم العربية أصلي

First Voice AI SDK with Native Arabic

1.09 billion Arabic, Hindi, and Urdu speakers. Zero on-device voice AI competitors. Not an afterthought -- built into the foundation.

الفصحى

MSA (Modern Standard)

Full coverage: Wake word, STT, TTS, Intent. The formal register of 420M+ speakers.

Full Support
الخليجي

Gulf Arabic

Wake word + STT + TTS. Tuned for UAE, Saudi, Bahrain, Kuwait, Qatar, Oman.

Full Support
المصري

Egyptian Arabic

100M+ speakers. The most widely understood Arabic dialect worldwide.

Q3 2026
الشامي

Levantine Arabic

Syria, Lebanon, Jordan, Palestine. 30M+ speakers.

Q3 2026
arabic_wake_word.py
from saj_speak import WakeEngine
 
# Arabic wake word: "Ya Saj" (يا ساج)
engine = WakeEngine("ya_saj.onnx")
engine.language = "ar"
engine.threshold = 0.9
 
for frame in microphone.stream():
    if engine.process(frame):
        print("يا ساج detected!")
Available wake words: Ya Saj يا ساج Ya Bella يا بيلا Ok Saj اوكي ساج + Custom
Developer Experience

Ship Voice Features
in Minutes, Not Months

Install the SDK, load a model, process audio. Three steps. Production-ready voice AI in your app.

$ pip install sajspeak
01

Install

pip install sajspeak

One dependency. Python, Rust, Node.js, Swift, WASM, or C -- pick your binding.

02

Configure

WakeEngine("hey_saj.onnx")

Load a model, set a threshold. Use pre-trained models or train your own via Console.

03

Ship

engine.process(frame)

Runs on-device. No cloud. No API keys. No latency. Ships with your binary.

Rust Python Node.js WASM Swift Kotlin C FFI
Comparison

Why Developers Switch to Saj Speak

More engines, better language support, open format, progressive pricing.

Feature Saj Speak Picovoice
Voice Engines 16 engines 8 engines
Core Language Rust (memory-safe) C (proprietary)
Model Format ONNX (open standard) Proprietary (.ppn)
Arabic Support Native (MSA + Gulf) Limited
Hindi / Urdu First-class None
Voice Cloning Self-service Not available
Neural Speech Codec 1.2-6.0 kbps Not available
E2E Encryption Signal-compliant Not available
Free Tier Wake Word + VAD + 2 models Non-commercial only
Pricing Model From $29/mo (5 tiers) Per-device fees

Simple, Transparent Pricing

Start free. Pay only when you ship to production. No per-inference fees.

Your models run on your hardware. No audio-minute charges. No surprise bills.

Free

$0 /month

Prototyping and learning

Wake Word + VAD 2 pre-trained models Community support Non-commercial use
Get Started

Indie

$29 /month

Solo devs and side projects

All 16 engines 5 pre-trained models 1 developer seat Commercial license Email support
Start Free Trial
MOST POPULAR

Pro

$199 /month

Teams shipping to production

All 16 engines 77 pre-trained models 5 team seats Custom model training (10/mo) Arabic, Hindi, Urdu models Priority support
Start Free Trial

Business

$999 /month

Scale deployments and custom models

Everything in Pro Unlimited team seats Unlimited model training Custom language training SSO + audit log Phone + Slack support
Start Free Trial

Enterprise

Custom

OEM, fleet, and large-scale

Everything in Business Per-device OEM licensing MCU / embedded support White-label option Dedicated engineer SLA + compliance
Contact Sales

Why On-Device?

Cloud voice APIs charge per minute, leak data, and add latency. On-device fixes all three.

Total Privacy

Zero audio data leaves the device. No cloud processing. No data retention. GDPR/HIPAA/NESA compliance by architecture.

Sub-ms Latency

0.27ms wake word inference vs 200-500ms cloud round-trip. Real-time voice interaction without waiting for the network.

Zero Per-Call Fees

No per-minute charges. No audio-hour billing. Your models run on your hardware. One SDK license, unlimited inference.

Built in Rust. Runs on ONNX. Ships as 107KB.

Memory-safe Rust core. Open ONNX model format -- no vendor lock-in. Models small enough to embed in firmware. 476 tests, zero failures. 15 patents filed at IP Australia.

476 tests passing
16 crates
15 patents filed
77 trained models

Start Building
Voice AI Today

Join developers building private, intelligent voice experiences that run entirely on-device. No cloud. No latency. No compromise.