Models

The Rapidly SDK ships with two model families: speech denoise and speech denoise + dereverb. Each family is available at four latencies (11, 21, 32, 96 ms), plus a micro size variant of the 32 ms model for CPU-constrained scenarios.

All models are trained on audio sampled at 48 kHz. The engine accepts other sample rates and resamples automatically.

For real-time factors and other performance characteristics, see Performance.

Speech denoise + dereverb

Removes both background noise and room reverb. Each model outputs cleaned dialogue, reverb, and noise as three separate busses.

Variant	File	Size
96 ms	`speech-denoise-dereverb-96ms.v1.0.rapidly`	926 KB
32 ms	`speech-denoise-dereverb-32ms.v1.0.rapidly`	854 KB
32 ms `micro`	`speech-denoise-dereverb-micro-32ms.v1.0.rapidly`	241 KB
21 ms	`speech-denoise-dereverb-21ms.v1.0.rapidly`	851 KB
11 ms	`speech-denoise-dereverb-11ms.v1.0.rapidly`	615 KB

Pick by latency:

96 ms for high-quality dialogue restoration. Excellent noise and reverb reduction when low latency isn't critical.
32 ms for a balanced trade-off. Strong reverb and noise reduction with moderate latency. Real-time speech enhancement where clarity and responsiveness both matter.
32 ms micro for scale. Strong reduction at a much higher real-time factor. Process more simultaneous streams per CPU with a small trade-off in suppression strength.
21 ms for responsiveness. Maintains strong suppression. Some transient noise may appear in highly dynamic environments.
11 ms for ultra-low latency live communication. Effective, with mild artifacts possible in very noisy or reverberant conditions.

Speech denoise

Removes background noise from speech. Each model outputs cleaned dialogue and isolated noise as two separate busses.

Variant	File	Size
96 ms	`speech-denoise-96ms.v1.0.rapidly`	925 KB
32 ms	`speech-denoise-32ms.v1.0.rapidly`	854 KB
32 ms `micro`	`speech-denoise-micro-32ms.v1.0.rapidly`	241 KB
21 ms	`speech-denoise-21ms.v1.0.rapidly`	851 KB
11 ms	`speech-denoise-11ms.v1.0.rapidly`	615 KB

Pick by latency:

96 ms for high-fidelity noise reduction that preserves natural speech tone. Ideal for recordings or post-processing.
32 ms for an excellent balance between speed and quality. Real-time use and production workflows.
32 ms micro for scale. Nearly identical quality at a much higher real-time factor. Run more simultaneous streams without losing clarity.
21 ms for compact, efficient real-time use. Clean noise reduction with minimal delay. Moderate noise environments.
11 ms for ultra-low latency live speech and conferencing. Retains the room's natural reverb, less aggressive noise reduction.

Speech denoise + dereverb

Speech denoise

On this page