Performance
How fast the Rapidly SDK runs and what affects throughput.
The Rapidly SDK is engineered for real-time embedded use. Every model in the catalog runs many times faster than real time on a single CPU core. The factors below determine actual throughput on a target device.
Real-time factor
The real-time factor is the ratio of audio duration to processing time. A real-time factor of 25x means one second of audio takes 1 / 25, or 40 milliseconds, to process. Anything above 1x is real-time capable. The higher the factor, the more headroom for additional tracks or other work on the same core.
Per-model real-time factors are listed in the Benchmarks section below.
What affects throughput
| Factor | How it changes throughput |
|---|---|
| Model latency variant (11, 21, 32, 96 ms) | Higher latencies tend to have higher throughput because they process larger blocks at a time. |
Model size variant (micro vs full) | The micro variant has a real-time factor several times higher than the full model, at a small quality cost. |
| Channel count | Mono is roughly half the cost of stereo. The engine processes channels independently. |
| Sample rate | The engine resamples to the model's training sample rate. Higher input rates add resampling cost. |
| Hardware acceleration | All modern desktop and mobile targets benefit. See Overview. |
| CPU architecture | ARM and Intel both perform well. NPU and GPU offload are not currently used. |
Picking a model for the CPU budget
For each latency, there is a trade-off between processing strength and CPU cost. As a rough guide:
- Tight latency and low CPU budget. Use a
microvariant. Example:speech-denoise-micro-32ms.v1.0.rapidlyis compact, with a very high real-time factor. - Tight latency and quality matters. Use the full 32 ms variants. Strong noise and reverb reduction with moderate latency.
- Lowest possible latency. Use the 11 ms or 21 ms variants. Trades some suppression strength for lower delay.
- Offline or post-production. Use the 96 ms variants for the highest fidelity.
The full model catalog is on the Models page.
Memory considerations
Each RapidlyEngine instance loads its own copy of the model into memory. Model file sizes are listed in the Models catalog. Running multiple instances in parallel for separate streams or chunks multiplies memory by the instance count. See Parallel processing for when this pattern is appropriate.
On-device vs cloud
Real-time use on-device is the canonical case. The real-time factors in the catalog give the engine plenty of headroom even on phones. For cloud workloads where many instances share cores to fan out a long file into chunks, see Parallel processing.
Benchmarks
The table below lists the real-time factor for each model in the v1.0 catalog.
| Model | Latency | File size | Real-time factor |
|---|---|---|---|
speech-denoise-11ms.v1.0.rapidly | 11 ms | 615 KB | 13x |
speech-denoise-21ms.v1.0.rapidly | 21 ms | 851 KB | 13x |
speech-denoise-32ms.v1.0.rapidly | 32 ms | 854 KB | 27x |
speech-denoise-micro-32ms.v1.0.rapidly | 32 ms | 241 KB | 125x |
speech-denoise-96ms.v1.0.rapidly | 96 ms | 925 KB | 29x |
speech-denoise-dereverb-11ms.v1.0.rapidly | 11 ms | 615 KB | 12x |
speech-denoise-dereverb-21ms.v1.0.rapidly | 21 ms | 851 KB | 12x |
speech-denoise-dereverb-32ms.v1.0.rapidly | 32 ms | 854 KB | 25x |
speech-denoise-dereverb-micro-32ms.v1.0.rapidly | 32 ms | 241 KB | 115x |
speech-denoise-dereverb-96ms.v1.0.rapidly | 96 ms | 926 KB | 27x |
Benchmark conditions
- CPU: single core of an AMD Ryzen AI MAX+ 395
- Audio: mono input at the model's training sample rate (48 kHz)
- Channel count: 1
- Build: v1.0 release binaries
Other platforms
Real-time factors on Apple Silicon, Intel Mac, ARM-based servers, iPhones, and Android devices are not yet published in this catalog. The engine targets real-time use on every platform listed in Overview. Contact us for measured numbers on your target hardware.