Audio Encoding¶

Flex Video supports optional audio encoding alongside video streams using Codec2, a low-bitrate vocal codec designed specifically for voice communication.

About Codec2¶

Codec2 is an open-source speech codec that operates at extremely low bitrates (1,200 to 3,200 bits per second). It is optimized for voice and speech fidelity, not music or general audio. This makes it ideal for tactical voice communication over bandwidth-constrained networks.

Mode	Bitrate	Use Case
3200	3.2 kbps	Highest voice quality
2400	2.4 kbps	Default — good balance of quality and bandwidth
1600	1.6 kbps	Reduced quality, lower bandwidth
1400	1.4 kbps	Low bandwidth environments
1300	1.3 kbps	Very low bandwidth
1200	1.2 kbps	Minimum bandwidth

Audio Sources¶

Flex Video supports three audio source types:

Stream Extraction¶

Extract audio from the incoming video stream (e.g., an MPEG-TS stream with embedded audio). This is the default source type.

Requires the input stream to contain an audio track
Not available when using a test video source

Local Device¶

Capture audio from a local ALSA device such as a USB microphone.

Use the web interface device picker or query GET /flex/audio-devices to list available capture devices
Device paths use the ALSA format (e.g., dsnoop:0,0, plughw:0,0, hw:1,0, default)

Test Tone¶

Generate a synthetic audio signal for testing purposes.

Wave types include sine, square, sawtooth, triangle, and silence (0–12)
Audio processing filters are automatically skipped for test sources

Audio Processing¶

Audio passes through a configurable processing chain before encoding. The processing order matters — each stage builds on the previous one:

1. High-Pass Noise Filter¶

A 200 Hz high-pass filter that removes wind noise and low-frequency rumble. Enabled by default.

Cuts frequencies below 200 Hz
Effective for outdoor environments with wind or machinery noise
Skipped for test sources

2. Noise Suppression¶

ML-based noise suppression that removes background noise while preserving voice clarity. Enabled by default.

Suppression levels:

Level	Aggressiveness	Best For
1	Low	Quiet environments with minimal noise
2	Moderate	Default — general purpose
3	High	Noisy environments
4	Aggressive	Very noisy environments (may affect voice quality)

Skipped for test sources

3. Automatic Gain Control (AGC)¶

Automatic gain control with a built-in limiter. Normalizes audio levels so quiet voices are amplified and loud sounds are capped. Enabled by default.

Skipped for test sources

4. Volume Adjustment¶

Optional manual volume control applied after all other processing.

Range: 0.0 (mute) to 2.0 (double volume)
1.0 = normal level
When not set, no volume adjustment is applied

Output Behavior¶

Audio output depends on the transport type:

RTSP Output¶

Audio is included as an additional stream alongside the video in the RTSP output. Receivers that support Codec2 can decode it; others will see only the video stream. Works with any video codec (H.264, H.265, AV1).

UDP / Multicast / TCP Output¶

Audio is multiplexed with video using the FlexMux container format. This requires AV1 as the video codec — the pipeline validator will reject configurations that combine audio with H.264 or H.265 over UDP/multicast/TCP output.

Configuring Audio in the Web Interface¶

Open the Add Pipeline or Edit Pipeline page
Expand the Audio section
Check Enable Audio Encoding
Select your audio source (Stream, Local, or Test)
For Local source, select the ALSA device from the dropdown
For Test source, select a wave type
Choose a Codec2 mode (default: 2400)
Adjust processing options as needed:
- Noise filter, noise suppression (with level), AGC, and volume
Start the pipeline

FlexMux Constraint

When using UDP, multicast, or TCP output with audio, the video codec must be set to AV1. The web interface will show a validation error if this constraint is not met.

Stream Audio

When using "Stream" as the audio source, your input must contain an audio track. If no audio is detected, the pipeline will start without audio and log a warning.

For information on receiving and playing back streams with audio, including required plugins, see Receiving & Playback.