Inference

Performance

How fast the Rapidly SDK runs and what affects throughput.

The Rapidly SDK is engineered for real-time embedded use. Every model in the catalog runs many times faster than real time on a single CPU core. The factors below determine actual throughput on a target device.

Real-time factor

The real-time factor is the ratio of audio duration to processing time. A real-time factor of 25x means one second of audio takes 1 / 25, or 40 milliseconds, to process. Anything above 1x is real-time capable. The higher the factor, the more headroom for additional tracks or other work on the same core.

Per-model real-time factors are listed in the Benchmarks section below.

What affects throughput

FactorHow it changes throughput
Model latency variant (11, 21, 32, 96 ms)Higher latencies tend to have higher throughput because they process larger blocks at a time.
Model size variant (micro vs full)The micro variant has a real-time factor several times higher than the full model, at a small quality cost.
Channel countMono is roughly half the cost of stereo. The engine processes channels independently.
Sample rateThe engine resamples to the model's training sample rate. Higher input rates add resampling cost.
Hardware accelerationAll modern desktop and mobile targets benefit. See Overview.
CPU architectureARM and Intel both perform well. NPU and GPU offload are not currently used.

Picking a model for the CPU budget

For each latency, there is a trade-off between processing strength and CPU cost. As a rough guide:

  • Tight latency and low CPU budget. Use a micro variant. Example: speech-denoise-micro-32ms.v1.0.rapidly is compact, with a very high real-time factor.
  • Tight latency and quality matters. Use the full 32 ms variants. Strong noise and reverb reduction with moderate latency.
  • Lowest possible latency. Use the 11 ms or 21 ms variants. Trades some suppression strength for lower delay.
  • Offline or post-production. Use the 96 ms variants for the highest fidelity.

The full model catalog is on the Models page.

Memory considerations

Each RapidlyEngine instance loads its own copy of the model into memory. Model file sizes are listed in the Models catalog. Running multiple instances in parallel for separate streams or chunks multiplies memory by the instance count. See Parallel processing for when this pattern is appropriate.

On-device vs cloud

Real-time use on-device is the canonical case. The real-time factors in the catalog give the engine plenty of headroom even on phones. For cloud workloads where many instances share cores to fan out a long file into chunks, see Parallel processing.

Benchmarks

The table below lists the real-time factor for each model in the v1.0 catalog.

ModelLatencyFile sizeReal-time factor
speech-denoise-11ms.v1.0.rapidly11 ms615 KB13x
speech-denoise-21ms.v1.0.rapidly21 ms851 KB13x
speech-denoise-32ms.v1.0.rapidly32 ms854 KB27x
speech-denoise-micro-32ms.v1.0.rapidly32 ms241 KB125x
speech-denoise-96ms.v1.0.rapidly96 ms925 KB29x
speech-denoise-dereverb-11ms.v1.0.rapidly11 ms615 KB12x
speech-denoise-dereverb-21ms.v1.0.rapidly21 ms851 KB12x
speech-denoise-dereverb-32ms.v1.0.rapidly32 ms854 KB25x
speech-denoise-dereverb-micro-32ms.v1.0.rapidly32 ms241 KB115x
speech-denoise-dereverb-96ms.v1.0.rapidly96 ms926 KB27x

Benchmark conditions

  • CPU: single core of an AMD Ryzen AI MAX+ 395
  • Audio: mono input at the model's training sample rate (48 kHz)
  • Channel count: 1
  • Build: v1.0 release binaries

Other platforms

Real-time factors on Apple Silicon, Intel Mac, ARM-based servers, iPhones, and Android devices are not yet published in this catalog. The engine targets real-time use on every platform listed in Overview. Contact us for measured numbers on your target hardware.