Custom model / PBX automation / batch + live API surfaces

Ship PBX agents, batch transcription, and live speech APIs from one custom model.

regn.io turns the verified 20.5 ASR bundle into three saleable products: a PBX voice agent, a batch speech API, and a realtime speech API. The positioning is commercial. The runtime stays operator-grade.

3 products On-prem or managed CPU, INT8 CPU, and CUDA tiers Real custom bundle path api.regn.io wired
PBX Product 3CX / SIP

Telephony automation product with local STT, local LLM, and local TTS posture.

Batch Product WAV / file API

Clean path for recorded calls, archives, voicemails, and job-based transcription.

Live Product WS first

WebSocket ingress first, then WebRTC once the streaming gateway is fully productized.

PBX runtime
20.5
Pinned to the real checkpoint bundle path rather than the legacy Whisper path.
Batch throughput
25,913
Balanced GPU WPM on the verified RTX 5090 run.
CPU anchor
6,712
Balanced FP32 WPM on the 285K host for plain CPU deployments.
Product split
3 SKUs
PBX agent, batch API, and realtime speech API.

Three products, one runtime story.

The stack is intentionally compact. Sell the PBX outcome, sell the batch inference surface, and extend the same model into live streaming instead of maintaining separate model families.

Product 1

PBX Voice Agent

AI voice automation for 3CX, SIP, and PBX environments where telephony behavior matters as much as transcription.

  • Best fit: reception, routing, callback capture, internal ops.
  • Packaging: customer-hosted or operator-managed Linux runtime.
  • Status: strongest shipped product path today.
Product 2

Batch Speech API

Recorded-audio inference for WAV and archived calls with clear profile and device choices.

  • Best fit: back-office processing, voicemail, archives, QA pipelines.
  • Surface: file-based transcription and job-oriented API wrapping.
  • Status: inference-ready, commercial wrapper next.
Product 3

Realtime Speech API

Live transcription product around the same model with WebSocket ingress first and WebRTC next.

  • Best fit: browser capture, operators, live assistants, streaming workflows.
  • Posture: GPU-first for lower-latency multi-session work.
  • Status: next build target.

Benchmark-backed capacity anchors.

The site should read like a product surface, but the numbers still matter. These are the warm throughput anchors used for deployment planning across the current stack.

GPU fast 32,929 WPM

Best throughput posture for premium batch and future live tiers.

CPU INT8 fast 8,980 WPM

Best CPU-only production posture on the verified 285K machine.

1 core INT8 1,935 WPM

Per-core anchor when you need dense fleet planning instead of single-box estimates.

Device Profile Warm WPM Best use
RTX 5090 Fast 32,929 Premium throughput, future live tier, highest-volume batch.
RTX 5090 Balanced 25,913 Conservative GPU default for strong production headroom.
Intel Core Ultra 9 285K INT8 Fast 8,980 Best CPU-only production posture.
Intel Core Ultra 9 285K FP32 Balanced 6,712 Simpler CPU deployments with fewer runtime choices.
Single core INT8 1,935 Per-core planning anchor for denser fleets.
Deployment shapes

Compact deployment ladder.

Use the smallest posture that still matches the workflow and the concurrency target.

  • Standalone: customer-hosted PBX or internal batch processing with local control.
  • Managed CPU INT8: efficient hosted transcription without GPU dependency.
  • Managed GPU: premium throughput and live-session headroom.
Choose the surface that matches the workflow. PBX outcome, raw file inference, or live streaming ingress.
PBX voice agent for telephony workflows Batch API for WAV and recorded audio Realtime API for live transcription sessions