Inference

Overview

What the Rapidly SDK is, where it runs, and how it's distributed.

Rapidly is a real-time audio separation SDK. The engine runs entirely on the end-user's device across Linux, Windows, macOS, iOS, and Android, with idiomatic bindings for C / C++, Python, Swift, and Kotlin. Inference latency starts at 11 ms.

Vocabulary

Three terms used across these docs.

Rapidly SDK
The package you install.
Inference
What the SDK does at runtime, on the end-user's CPU.
Engine
One running instance processing a single audio stream on a single CPU thread.

For when you need more than one engine, see One vs multiple engines and Parallel processing.

Key strengths

On-device inference

Audio never leaves the device. The full engine runs locally, with no cloud round-trip for inference.

Real-time, low latency

Latency from 11 ms to 96 ms across the model catalog. Real-time factor up to 125x on a single CPU core.

Five platforms

Linux, Windows, macOS, iOS, and Android. The same engine runs everywhere.

Four language bindings

C / C++, Python, Swift, and Kotlin. Distributed via the standard package manager for each language.

Hardware-accelerated math

Apple Accelerate (vDSP), Intel IPP, and ARM NEON paths selected automatically per platform.

Small footprint

Compact models from 241 KB. Suits CPU-constrained embedded and mobile targets.

Pick your binding

Supported platforms

PlatformArchitecturesDistribution
Linuxx64, arm64Shared library in bin/linux-x64/ and bin/linux-arm64/
Windowsx64, x86DLL plus import library in bin/windows-x64/ and bin/windows-x86/
macOSUniversal (arm64 + x86_64)dylib in bin/macos/, or the signed RapidlyEngine.xcframework, or Swift Package Manager
iOSarm64 device and SimulatorRapidlyEngine.xcframework, or Swift Package Manager
Androidarm64-v8aMaven Central (io.rapidly:rapidly-sdk:1.0), or .aar from the GitHub Release

Minimum requirements

  • iOS 14 or later, iPadOS 14 or later
  • macOS 11 or later
  • Android minSdk 26 (Android 8.0)

Linux and Windows do not have a hard minimum; any reasonably modern distribution or release should work.

What the SDK includes

ComponentPurpose
Native engine binaryThe cross-platform core that loads models and runs inference. One binary per platform.
Public C header (RapidlyEngine.h)The stable API surface that every binding wraps.
Language bindingsIdiomatic wrappers for C / C++, Python, Swift, and Kotlin.
Pre-trained models.rapidly files for speech denoising and dereverberation, in multiple latency variants. See Models.
ExamplesWorking integrations for file processing and embedded targets, shipped in the GitHub Release.

Distribution

The SDK ships as a GitHub Release with pre-built binaries, the Apple xcframework, the Android .aar, the public header, and the bindings source. Customers can also pull bindings directly from each language's package manager:

ChannelWhat's there
GitHub ReleasePre-built binaries, the xcframework, the .aar, the public header, and the bindings source.
PyPIpip install rapidly for the Python binding.
Swift Package Managerhttps://github.com/rapidly-labs/rapidly-sdk for the Swift binding.
Maven Centralio.rapidly:rapidly-sdk:1.0 for the Kotlin binding.

Hardware acceleration

The engine selects the fastest available math path per platform at runtime:

PlatformAcceleration
Apple (macOS, iOS)Accelerate framework (vDSP)
Intel desktop and serverIntel Performance Primitives (IPP)
ARM (Linux arm64, Android, Apple Silicon)NEON intrinsics
Other targetsOptimised C++ fallback

No configuration is required. The engine picks the best path per architecture on its own.

Coming soon

WebAssembly. WASM is on the roadmap to enable in-browser audio processing without a server round-trip. Target use cases: web apps, SaaS products, browser-based conferencing, live streaming, and smart TVs running web-based platforms like Samsung Tizen or LG webOS.

NPU offload. Hardware-accelerated inference on chips with dedicated neural acceleration units.

Licensing model

The SDK enforces license entitlements locally. No network access is required. Without a covering license, the engine still runs and loads models, but its output is watermarked. See Pricing for licensing options.