On-Device Voice AI SDK

Build Voice AI
That Never Leaves
the Device

16 engines. One SDK. Every platform. From wake word detection to speech synthesis — all running on-device with sub-millisecond latency and zero cloud dependency.

Get Saj Link Try the Demo

$ pip install sajspeak

Hundreds of tests passing IP protected Open ONNX format

from saj_speak import WakeEngine
 
engine = WakeEngine("hey_saj.onnx")
engine.threshold = 0.9
 
for frame in microphone.stream():
    if engine.process(frame):
        print("Wake word detected!")

use saj_speak::WakeEngine;
 
let engine = WakeEngine::from_model("hey_saj.onnx")?;
engine.set_threshold(0.9);
 
// Process audio frames
if engine.process(&audio_frame)? {
    println!("Wake word detected!");
}

import { WakeEngine } from '@saj-speak/wasm';
 
const engine = await WakeEngine.load('hey_saj.onnx');
engine.threshold = 0.9;
 
engine.onDetection(() => {
    console.log('Wake word detected!');
});
engine.start();

import SajSpeak
 
let engine = try WakeEngine(model: "hey_saj.onnx")
engine.threshold = 0.9
 
engine.onDetection { result in
    print("Detected: \(result.confidence)")
}
try engine.start()

Why Saj Speak

Voice AI That Respects
Your Users

Four principles. No compromise.

Privacy by Design

Your voice never leaves your device. Ever.

No cloud processing, no data retention, no telemetry. GDPR, HIPAA, and NESA compliant by architecture -- not by policy.

Arabic-First

The only voice AI SDK with native Arabic from day one.

Not a translation. Not an afterthought. Arabic, Hindi, and Urdu built into the foundation alongside English.

Every Platform

One SDK. iOS, Android, Linux, Windows, macOS, Raspberry Pi, WASM, MCU.

Rust core with native bindings. Write once, deploy to phones, speakers, embedded devices, and browsers.

Open Format

ONNX-based models. No vendor lock-in.

Standard ONNX models you own and control. Inspect, fine-tune, or replace -- your models are never trapped in a proprietary format.

Complete Platform

16 Voice Engines. One SDK.

Every engine runs on-device. No cloud. No API keys. No per-inference fees. Ship voice features that work offline, everywhere.

saj-wake

Wake Word

Custom keyword detection. Lightweight neural architecture, under 1MB models.

<1ms inference | EN AR HI UR

saj-listen

Streaming STT

Real-time transcription with Zipformer + RNN-T.

Real-time | EN AR HI UR

saj-scribe

Batch STT

File transcription with Whisper. Split encoder/decoder.

Whisper | 100+ languages

saj-speak-tts

Text-to-Speech

Natural on-device synthesis. Kokoro + Piper backends.

Voice cloning | EN AR HI

saj-detect

Voice Activity

Neural VAD with Silero-compatible architecture.

<1ms latency | Universal

saj-voice

Speaker ID

ECAPA-TDNN embeddings. Biometric-grade accuracy.

192-dim embed | Universal

saj-who

Diarization

Real-time speaker segmentation. Who spoke when.

Real-time | Universal

saj-clean

Noise Suppression

RNNoise + DeepFilterNet. Crystal-clear audio.

Lightweight | Universal

saj-intent

Speech Intent

Direct intent extraction. Voice commands to actions.

Skip the text | EN AR

saj-clone

Voice Cloning

Clone any voice from samples. Self-service on-device.

Self-service | EN AR

saj-pipeline

Pipeline

Unified voice interaction: Wake, VAD, STT, Intent, TTS.

End-to-end | All langs

saj-codec

Neural Codec

Mimi neural codec. 1.2-6.0 kbps adaptive bitrate.

Real-time CPU | Universal

saj-sense

Multimodal Sensing

Unified token streams with per-modality encryption.

Patent portfolio | Universal

saj-link

Encrypted Comms

End-to-end encryption. Sealed sender. Forward secrecy.

Tested | Universal

saj-server

Self-Hosted Server

REST + WebSocket. OpenAI/Deepgram-compatible API.

Multi-tenant | Docker

Interactive Demo

Try Wake Word Detection
in Your Browser

This runs entirely on your device via WebAssembly. No audio leaves your browser. Zero network requests.

COMING SOON

Model: hey_saj.onnx ya_saj.onnx hey_bella.onnx Language: EN AR

Status: Listening... |

DETECTION LOG Model: hey_saj.onnx (under 1MB)

14:23:01.003 "Hey Saj" detected conf: 0.97 0.8ms

14:23:04.221 "Hey Saj" detected conf: 0.94 0.9ms

14:23:07.889 "Hey Saj" detected conf: 0.96 0.7ms

Architecture: Neural Avg latency: <1ms Audio processed: 12.4s Detections: 3

Zero audio data transmitted. Runs via WebAssembly on your CPU.

Native Arabic Support

أول منصة صوت ذكي تدعم العربية أصلي

First Voice AI SDK with Native Arabic

Gulf Arabic in Riyadh. Egyptian in Cairo. Levantine in Beirut. One SDK. 420M+ Arabic speakers. 1.09 billion Arabic, Hindi, and Urdu speakers combined. Zero on-device voice AI competitors.

الفصحى

MSA (Modern Standard)

Full coverage: Wake word, STT, TTS, Intent. The formal register of 420M+ speakers.

Full Support

الخليجي

Gulf Arabic

Wake word + STT + TTS. Tuned for UAE, Saudi, Bahrain, Kuwait, Qatar, Oman.

Full Support

المصري

Egyptian Arabic

100M+ speakers. The most widely understood Arabic dialect worldwide.

Q3 2026

الشامي

Levantine Arabic

Syria, Lebanon, Jordan, Palestine. 30M+ speakers.

Q3 2026

arabic_wake_word.py

from saj_speak import WakeEngine

# Arabic wake word: "Ya Saj" (يا ساج)

engine = WakeEngine("ya_saj.onnx")

engine.language = "ar"

engine.threshold = 0.9

for frame in microphone.stream():

if engine.process(frame):

print("يا ساج detected!")

Available wake words: Ya Saj يا ساج Ya Bella يا بيلا Ok Saj اوكي ساج + Custom

Enterprise MENA Voice AI

420M+ speakers. Zero on-device voice AI competitors.

Banks, telcos, and government agencies across the Gulf need voice AI that processes Arabic on-device. Saj Speak is the only SDK that delivers -- with dialect awareness, sovereign data processing, and NESA compliance built in.

Contact Enterprise Sales

Developer Experience

Ship Voice Features
in Minutes, Not Months

Install the SDK, load a model, process audio. Three steps. Production-ready voice AI in your app.

$ pip install sajspeak

Install

pip install sajspeak

One dependency. Python, Rust, Node.js, Swift, WASM, or C -- pick your binding.

Configure

WakeEngine("hey_saj.onnx")

Load a model, set a threshold. Use pre-trained models or train your own via Console.

Ship

engine.process(frame)

Runs on-device. No cloud. No API keys. No latency. Ships with your binary.

Rust Python Node.js WASM Swift Kotlin C FFI

Open Source Foundation

Built on proven open-source models and inference engines. ONNX Runtime and Rust -- battle-tested technology you can trust.

Active Development

Continuously improved models with new languages, engines, and platform support shipping regularly. Growing developer community.

IP Protected

Extensive patent portfolio covering voice AI, neural codecs, multimodal sensing, and on-device intelligence.

$ pip install sajspeak

Comparison

Why Developers Switch
to Saj Speak

More engines, better language support, open format, progressive pricing.

Feature	Saj Speak	Picovoice
Voice Engines	16 engines	8 engines
Core Language	Rust (memory-safe)	C (proprietary)
Model Format	ONNX (open standard)	Proprietary (.ppn)
Arabic Support	Native (MSA + Gulf)	Limited
Hindi / Urdu	First-class	None
Voice Cloning	Self-service	Not available
Neural Speech Codec	1.2-6.0 kbps	Not available
E2E Encryption	End-to-end	Not available
Free Tier	Wake Word + VAD + 2 models	Non-commercial only
Pricing Model	From $29/mo (5 tiers)	Per-device fees

Simple, Transparent Pricing

Start free. Pay only when you ship to production. No per-inference fees.

Your models run on your hardware. No audio-minute charges. No surprise bills.

Free

$0 /month

Prototyping and learning

✓ Wake Word + VAD ✓ 2 pre-trained models ✓ Community support ✓ Non-commercial use

Get Started

Indie

$29 /month

Solo devs and side projects

✓ All 16 engines ✓ 5 pre-trained models ✓ 1 developer seat ✓ Commercial license ✓ Email support

Start Free Trial

Pro

$199 /month

Teams shipping to production

✓ All 16 engines ✓ Full model library ✓ 5 team seats ✓ Custom model training (10/mo) ✓ Arabic, Hindi, Urdu models ✓ Priority support

Start Free Trial

Business

$999 /month

Scale deployments and custom models

✓ Everything in Pro ✓ Unlimited team seats ✓ Unlimited model training ✓ Custom language training ✓ SSO + audit log ✓ Phone + Slack support

Start Free Trial

Enterprise

Custom

OEM, fleet, and large-scale

✓ Everything in Business ✓ Per-device OEM licensing ✓ MCU / embedded support ✓ White-label option ✓ Dedicated engineer ✓ SLA + compliance

Contact Sales

Feature	Free	Indie	Pro	Business	Enterprise
Engines
Wake Word	✓	✓	✓	✓	✓
VAD	✓	✓	✓	✓	✓
STT (Streaming)	--	✓	✓	✓	✓
STT (Batch/Whisper)	--	✓	✓	✓	✓
TTS	--	✓	✓	✓	✓
Speaker ID	--	✓	✓	✓	✓
Diarization	--	--	✓	✓	✓
Speech-to-Intent	--	--	✓	✓	✓
Voice Cloning	--	--	--	✓	✓
Models & Training
Pre-trained models	2	5	All	All	All + custom
Custom model training	--	--	10/mo	Unlimited	Unlimited
Languages
English	✓	✓	✓	✓	✓
Arabic (MSA + Gulf)	--	--	✓	✓	✓
Hindi / Urdu	--	--	✓	✓	✓
Support
Community (Discord)	✓	✓	✓	✓	✓
Email	--	✓	✓	✓	✓
Priority support	--	--	✓	✓	✓
Dedicated engineer	--	--	--	--	✓
SLA guarantee	--	--	--	--	✓

Why On-Device?

Cloud voice APIs charge per minute, leak data, and add latency. On-device fixes all three.

Total Privacy

Zero audio data leaves the device. No cloud processing. No data retention. GDPR/HIPAA/NESA compliance by architecture.

Sub-ms Latency

Sub-millisecond wake word inference vs 200-500ms cloud round-trip. Real-time voice interaction without waiting for the network.

Zero Per-Call Fees

No per-minute charges. No audio-hour billing. Your models run on your hardware. One SDK license, unlimited inference.

Built on Saj Speak

Get Saj Link

Encrypted team communication with Bella AI built in. Post-quantum security. Available now for macOS and web.

E2E encrypted messaging, neural codec voice calls at ultra-low bandwidth, on-device meeting intelligence, and Arabic-first design.

Download for Mac Open in Browser

Coming soon: iOS Android Windows

End-to-End Encryption

Sealed sender. Forward secrecy. E2E for messages, calls, and media.

Neural Codec Calls

Crystal-clear voice at 1-6 kbps. 20-30x less bandwidth than Opus.

Voice Intelligence

On-device transcription, translation, and meeting summaries.

Arabic-First Design

Native RTL, dialect awareness, voice messages as rich cards.

Build Voice AIThat Never Leavesthe Device

Voice AI That Respects Your Users

Privacy by Design

Arabic-First

Every Platform

Open Format

16 Voice Engines. One SDK.

Wake Word

Streaming STT

Batch STT

Text-to-Speech

Voice Activity

Speaker ID

Diarization

Noise Suppression

Speech Intent

Voice Cloning

Pipeline

Neural Codec

Multimodal Sensing

Encrypted Comms

Self-Hosted Server

Try Wake Word Detection in Your Browser

أول منصة صوت ذكي تدعم العربية أصلي

First Voice AI SDK with Native Arabic

MSA (Modern Standard)

Gulf Arabic

Egyptian Arabic

Levantine Arabic

Enterprise MENA Voice AI

Ship Voice Featuresin Minutes, Not Months

Install

Configure

Ship

Why Developers Switch to Saj Speak

Simple, Transparent Pricing

Free

Indie

Pro

Business

Enterprise

Why On-Device?

Total Privacy

Sub-ms Latency

Zero Per-Call Fees

Get Saj Link

Built in Rust. Runs on ONNX. Ships Under 1MB.

Start BuildingVoice AI Today

Build Voice AI
That Never Leaves
the Device

Voice AI That Respects
Your Users

Try Wake Word Detection
in Your Browser

Ship Voice Features
in Minutes, Not Months

Why Developers Switch
to Saj Speak

Start Building
Voice AI Today