On-Device Voice AI SDK

Build Voice AI
That Never Leaves
the Device

16 engines. One SDK. Every platform. From wake word detection to speech synthesis — all running on-device with sub-millisecond latency and zero cloud dependency.

$ pip install sajspeak
Hundreds of tests passing IP protected Open ONNX format
from saj_speak import WakeEngine
 
engine = WakeEngine("hey_saj.onnx")
engine.threshold = 0.9
 
for frame in microphone.stream():
    if engine.process(frame):
        print("Wake word detected!")
use saj_speak::WakeEngine;
 
let engine = WakeEngine::from_model("hey_saj.onnx")?;
engine.set_threshold(0.9);
 
// Process audio frames
if engine.process(&audio_frame)? {
    println!("Wake word detected!");
}
import { WakeEngine } from '@saj-speak/wasm';
 
const engine = await WakeEngine.load('hey_saj.onnx');
engine.threshold = 0.9;
 
engine.onDetection(() => {
    console.log('Wake word detected!');
});
engine.start();
import SajSpeak
 
let engine = try WakeEngine(model: "hey_saj.onnx")
engine.threshold = 0.9
 
engine.onDetection { result in
    print("Detected: \(result.confidence)")
}
try engine.start()
<1ms
Inference Latency
16
Voice Engines
70+
Trained Models
<1MB
Wake Word Model
4
Languages (EN, AR, HI, UR)
Why Saj Speak

Voice AI That Respects Your Users

Four principles. No compromise.

Privacy by Design

Your voice never leaves your device. Ever.

No cloud processing, no data retention, no telemetry. GDPR, HIPAA, and NESA compliant by architecture -- not by policy.

ع

Arabic-First

The only voice AI SDK with native Arabic from day one.

Not a translation. Not an afterthought. Arabic, Hindi, and Urdu built into the foundation alongside English.

Every Platform

One SDK. iOS, Android, Linux, Windows, macOS, Raspberry Pi, WASM, MCU.

Rust core with native bindings. Write once, deploy to phones, speakers, embedded devices, and browsers.

Open Format

ONNX-based models. No vendor lock-in.

Standard ONNX models you own and control. Inspect, fine-tune, or replace -- your models are never trapped in a proprietary format.

Complete Platform

16 Voice Engines. One SDK.

Every engine runs on-device. No cloud. No API keys. No per-inference fees. Ship voice features that work offline, everywhere.

saj-wake

Wake Word

Custom keyword detection. Lightweight neural architecture, under 1MB models.

<1ms inference | EN AR HI UR
saj-listen

Streaming STT

Real-time transcription with Zipformer + RNN-T.

Real-time | EN AR HI UR
saj-scribe

Batch STT

File transcription with Whisper. Split encoder/decoder.

Whisper | 100+ languages
saj-speak-tts

Text-to-Speech

Natural on-device synthesis. Kokoro + Piper backends.

Voice cloning | EN AR HI
saj-detect

Voice Activity

Neural VAD with Silero-compatible architecture.

<1ms latency | Universal
saj-voice

Speaker ID

ECAPA-TDNN embeddings. Biometric-grade accuracy.

192-dim embed | Universal
saj-who

Diarization

Real-time speaker segmentation. Who spoke when.

Real-time | Universal
saj-clean

Noise Suppression

RNNoise + DeepFilterNet. Crystal-clear audio.

Lightweight | Universal
saj-intent

Speech Intent

Direct intent extraction. Voice commands to actions.

Skip the text | EN AR
saj-clone

Voice Cloning

Clone any voice from samples. Self-service on-device.

Self-service | EN AR
saj-pipeline

Pipeline

Unified voice interaction: Wake, VAD, STT, Intent, TTS.

End-to-end | All langs
saj-codec

Neural Codec

Mimi neural codec. 1.2-6.0 kbps adaptive bitrate.

Real-time CPU | Universal
saj-sense

Multimodal Sensing

Unified token streams with per-modality encryption.

Patent portfolio | Universal
saj-link

Encrypted Comms

Signal-compliant E2E encryption. X3DH + Double Ratchet.

Tested | Universal
saj-server

Self-Hosted Server

REST + WebSocket. OpenAI/Deepgram-compatible API.

Multi-tenant | Docker
Interactive Demo

Try Wake Word Detection in Your Browser

This runs entirely on your device via WebAssembly. No audio leaves your browser. Zero network requests.

COMING SOON
Model: hey_saj.onnx ya_saj.onnx hey_bella.onnx Language: EN AR
Status: Listening... |
DETECTION LOG Model: hey_saj.onnx (under 1MB)
14:23:01.003 "Hey Saj" detected conf: 0.97 0.8ms
14:23:04.221 "Hey Saj" detected conf: 0.94 0.9ms
14:23:07.889 "Hey Saj" detected conf: 0.96 0.7ms
Architecture: Neural Avg latency: <1ms Audio processed: 12.4s Detections: 3
Zero audio data transmitted. Runs via WebAssembly on your CPU.
Native Arabic Support

أول منصة صوت ذكي تدعم العربية أصلي

First Voice AI SDK with Native Arabic

Gulf Arabic in Riyadh. Egyptian in Cairo. Levantine in Beirut. One SDK. 420M+ Arabic speakers. 1.09 billion Arabic, Hindi, and Urdu speakers combined. Zero on-device voice AI competitors.

الفصحى

MSA (Modern Standard)

Full coverage: Wake word, STT, TTS, Intent. The formal register of 420M+ speakers.

Full Support
الخليجي

Gulf Arabic

Wake word + STT + TTS. Tuned for UAE, Saudi, Bahrain, Kuwait, Qatar, Oman.

Full Support
المصري

Egyptian Arabic

100M+ speakers. The most widely understood Arabic dialect worldwide.

Q3 2026
الشامي

Levantine Arabic

Syria, Lebanon, Jordan, Palestine. 30M+ speakers.

Q3 2026
arabic_wake_word.py
from saj_speak import WakeEngine
 
# Arabic wake word: "Ya Saj" (يا ساج)
engine = WakeEngine("ya_saj.onnx")
engine.language = "ar"
engine.threshold = 0.9
 
for frame in microphone.stream():
    if engine.process(frame):
        print("يا ساج detected!")
Available wake words: Ya Saj يا ساج Ya Bella يا بيلا Ok Saj اوكي ساج + Custom

Enterprise MENA Voice AI

420M+ speakers. Zero on-device voice AI competitors.

Banks, telcos, and government agencies across the Gulf need voice AI that processes Arabic on-device. Saj Speak is the only SDK that delivers -- with dialect awareness, sovereign data processing, and NESA compliance built in.

Contact Enterprise Sales
Developer Experience

Ship Voice Features
in Minutes, Not Months

Install the SDK, load a model, process audio. Three steps. Production-ready voice AI in your app.

$ pip install sajspeak
01

Install

pip install sajspeak

One dependency. Python, Rust, Node.js, Swift, WASM, or C -- pick your binding.

02

Configure

WakeEngine("hey_saj.onnx")

Load a model, set a threshold. Use pre-trained models or train your own via Console.

03

Ship

engine.process(frame)

Runs on-device. No cloud. No API keys. No latency. Ships with your binary.

Rust Python Node.js WASM Swift Kotlin C FFI
Open Source Foundation

Built on proven open-source models and inference engines. ONNX Runtime and Rust -- battle-tested technology you can trust.

Active Development

Continuously improved models with new languages, engines, and platform support shipping regularly. Growing developer community.

IP Protected

Extensive patent portfolio filed at IP Australia covering wake word detection, neural codecs, multimodal sensing, and more.

$ pip install sajspeak
Comparison

Why Developers Switch to Saj Speak

More engines, better language support, open format, progressive pricing.

Feature Saj Speak Picovoice
Voice Engines 16 engines 8 engines
Core Language Rust (memory-safe) C (proprietary)
Model Format ONNX (open standard) Proprietary (.ppn)
Arabic Support Native (MSA + Gulf) Limited
Hindi / Urdu First-class None
Voice Cloning Self-service Not available
Neural Speech Codec 1.2-6.0 kbps Not available
E2E Encryption Signal-compliant Not available
Free Tier Wake Word + VAD + 2 models Non-commercial only
Pricing Model From $29/mo (5 tiers) Per-device fees

Simple, Transparent Pricing

Start free. Pay only when you ship to production. No per-inference fees.

Your models run on your hardware. No audio-minute charges. No surprise bills.

Free

$0 /month

Prototyping and learning

Wake Word + VAD 2 pre-trained models Community support Non-commercial use
Get Started

Indie

$29 /month

Solo devs and side projects

All 16 engines 5 pre-trained models 1 developer seat Commercial license Email support
Start Free Trial
MOST POPULAR

Pro

$199 /month

Teams shipping to production

All 16 engines Full model library 5 team seats Custom model training (10/mo) Arabic, Hindi, Urdu models Priority support
Start Free Trial

Business

$999 /month

Scale deployments and custom models

Everything in Pro Unlimited team seats Unlimited model training Custom language training SSO + audit log Phone + Slack support
Start Free Trial

Enterprise

Custom

OEM, fleet, and large-scale

Everything in Business Per-device OEM licensing MCU / embedded support White-label option Dedicated engineer SLA + compliance
Contact Sales

Why On-Device?

Cloud voice APIs charge per minute, leak data, and add latency. On-device fixes all three.

Total Privacy

Zero audio data leaves the device. No cloud processing. No data retention. GDPR/HIPAA/NESA compliance by architecture.

Sub-ms Latency

Sub-millisecond wake word inference vs 200-500ms cloud round-trip. Real-time voice interaction without waiting for the network.

Zero Per-Call Fees

No per-minute charges. No audio-hour billing. Your models run on your hardware. One SDK license, unlimited inference.

Built on Saj Speak

Saj Link

Sovereign encrypted communications -- built on the same voice AI engines that power Saj Speak.

E2E encrypted messaging, neural codec voice calls at ultra-low bandwidth, on-device meeting intelligence, and Arabic-first design. The "sovereign Slack" position is unoccupied -- Saj Link fills it.

Signal-Grade Encryption

X3DH + Double Ratchet. E2E for messages, calls, and media.

Neural Codec Calls

Crystal-clear voice at 1-6 kbps. 20-30x less bandwidth than Opus.

Voice Intelligence

On-device transcription, translation, and meeting summaries.

ع
Arabic-First Design

Native RTL, dialect awareness, voice messages as rich cards.

Built in Rust. Runs on ONNX. Ships Under 1MB.

Memory-safe Rust core. Open ONNX model format -- no vendor lock-in. Models small enough to embed in firmware. Hundreds of tests, zero failures. Protected by an extensive patent portfolio.

Hundreds of tests passing
16 crates
Patent portfolio filed
70+ trained models

Start Building
Voice AI Today

Join 200+ developers building private, intelligent voice experiences that run entirely on-device. No cloud. No latency. No compromise.

$ pip install sajspeak