Performance

The Rapidly SDK is engineered for real-time embedded use. Every model in the catalog runs many times faster than real time on a single CPU core. The factors below determine actual throughput on a target device.

Real-time factor

The real-time factor is the ratio of audio duration to processing time. A real-time factor of 25x means one second of audio takes 1 / 25, or 40 milliseconds, to process. Anything above 1x is real-time capable. The higher the factor, the more headroom for additional tracks or other work on the same core.

Per-model real-time factors are listed in the Benchmarks section below.

What affects throughput

Factor	How it changes throughput
Model latency variant (11, 21, 32, 96 ms)	Higher latencies tend to have higher throughput because they process larger blocks at a time.
Model size variant (`micro` vs full)	The `micro` variant has a real-time factor several times higher than the full model, at a small quality cost.
Channel count	Mono is roughly half the cost of stereo. The engine processes channels independently.
Sample rate	The engine resamples to the model's training sample rate. Higher input rates add resampling cost.
Hardware acceleration	All modern desktop and mobile targets benefit. See Overview.
CPU architecture	ARM and Intel both perform well. NPU and GPU offload are not currently used.

Picking a model for the CPU budget

For each latency, there is a trade-off between processing strength and CPU cost. As a rough guide:

Tight latency and low CPU budget. Use a micro variant. Example: speech-denoise-micro-32ms.v1.0.rapidly is compact, with a very high real-time factor.
Tight latency and quality matters. Use the full 32 ms variants. Strong noise and reverb reduction with moderate latency.
Lowest possible latency. Use the 11 ms or 21 ms variants. Trades some suppression strength for lower delay.
Offline or post-production. Use the 96 ms variants for the highest fidelity.

The full model catalog is on the Models page.

Memory considerations

Each RapidlyEngine instance loads its own copy of the model into memory. Model file sizes are listed in the Models catalog. Running multiple instances in parallel for separate streams or chunks multiplies memory by the instance count. See Parallel processing for when this pattern is appropriate.

On-device vs cloud

Real-time use on-device is the canonical case. The real-time factors in the catalog give the engine plenty of headroom even on phones. For cloud workloads where many instances share cores to fan out a long file into chunks, see Parallel processing.

Benchmarks

The table below lists the real-time factor for each model in the v1.0 catalog.

Model	Latency	File size	Real-time factor
`speech-denoise-11ms.v1.0.rapidly`	11 ms	615 KB	13x
`speech-denoise-21ms.v1.0.rapidly`	21 ms	851 KB	13x
`speech-denoise-32ms.v1.0.rapidly`	32 ms	854 KB	27x
`speech-denoise-micro-32ms.v1.0.rapidly`	32 ms	241 KB	125x
`speech-denoise-96ms.v1.0.rapidly`	96 ms	925 KB	29x
`speech-denoise-dereverb-11ms.v1.0.rapidly`	11 ms	615 KB	12x
`speech-denoise-dereverb-21ms.v1.0.rapidly`	21 ms	851 KB	12x
`speech-denoise-dereverb-32ms.v1.0.rapidly`	32 ms	854 KB	25x
`speech-denoise-dereverb-micro-32ms.v1.0.rapidly`	32 ms	241 KB	115x
`speech-denoise-dereverb-96ms.v1.0.rapidly`	96 ms	926 KB	27x

Benchmark conditions

CPU: single core of an AMD Ryzen AI MAX+ 395
Audio: mono input at the model's training sample rate (48 kHz)
Channel count: 1
Build: v1.0 release binaries

Other platforms

Real-time factors on Apple Silicon, Intel Mac, ARM-based servers, iPhones, and Android devices are not yet published in this catalog. The engine targets real-time use on every platform listed in Overview. Contact us for measured numbers on your target hardware.

On this page