Audio Encoding¶
Flex Video supports optional audio encoding alongside video streams using Codec2, a low-bitrate vocal codec designed specifically for voice communication.
About Codec2¶
Codec2 is an open-source speech codec that operates at extremely low bitrates (1,200 to 3,200 bits per second). It is optimized for voice and speech fidelity, not music or general audio. This makes it ideal for tactical voice communication over bandwidth-constrained networks.
| Mode | Bitrate | Use Case |
|---|---|---|
| 3200 | 3.2 kbps | Highest voice quality |
| 2400 | 2.4 kbps | Default — good balance of quality and bandwidth |
| 1600 | 1.6 kbps | Reduced quality, lower bandwidth |
| 1400 | 1.4 kbps | Low bandwidth environments |
| 1300 | 1.3 kbps | Very low bandwidth |
| 1200 | 1.2 kbps | Minimum bandwidth |
Audio Sources¶
Flex Video supports three audio source types:
Stream Extraction¶
Extract audio from the incoming video stream (e.g., an MPEG-TS stream with embedded audio). This is the default source type.
- Requires the input stream to contain an audio track
- Not available when using a test video source
Local Device¶
Capture audio from a local ALSA device such as a USB microphone.
- Use the web interface device picker or query
GET /flex/audio-devicesto list available capture devices - Device paths use the ALSA format (e.g.,
dsnoop:0,0,plughw:0,0,hw:1,0,default)
Test Tone¶
Generate a synthetic audio signal for testing purposes.
- Wave types include sine, square, sawtooth, triangle, and silence (0–12)
- Audio processing filters are automatically skipped for test sources
Audio Processing¶
Audio passes through a configurable processing chain before encoding. The processing order matters — each stage builds on the previous one:
1. High-Pass Noise Filter¶
A 200 Hz high-pass filter that removes wind noise and low-frequency rumble. Enabled by default.
- Cuts frequencies below 200 Hz
- Effective for outdoor environments with wind or machinery noise
- Skipped for test sources
2. Noise Suppression¶
ML-based noise suppression that removes background noise while preserving voice clarity. Enabled by default.
Suppression levels:
| Level | Aggressiveness | Best For |
|---|---|---|
| 1 | Low | Quiet environments with minimal noise |
| 2 | Moderate | Default — general purpose |
| 3 | High | Noisy environments |
| 4 | Aggressive | Very noisy environments (may affect voice quality) |
- Skipped for test sources
3. Automatic Gain Control (AGC)¶
Automatic gain control with a built-in limiter. Normalizes audio levels so quiet voices are amplified and loud sounds are capped. Enabled by default.
- Skipped for test sources
4. Volume Adjustment¶
Optional manual volume control applied after all other processing.
- Range: 0.0 (mute) to 2.0 (double volume)
- 1.0 = normal level
- When not set, no volume adjustment is applied
Output Behavior¶
Audio output depends on the transport type:
RTSP Output¶
Audio is included as an additional stream alongside the video in the RTSP output. Receivers that support Codec2 can decode it; others will see only the video stream. Works with any video codec (H.264, H.265, AV1).
UDP / Multicast / TCP Output¶
Audio is multiplexed with video using the FlexMux container format. This requires AV1 as the video codec — the pipeline validator will reject configurations that combine audio with H.264 or H.265 over UDP/multicast/TCP output.
Configuring Audio in the Web Interface¶
- Open the Add Pipeline or Edit Pipeline page
- Expand the Audio section
- Check Enable Audio Encoding
- Select your audio source (Stream, Local, or Test)
- For Local source, select the ALSA device from the dropdown
- For Test source, select a wave type
- Choose a Codec2 mode (default: 2400)
- Adjust processing options as needed:
- Noise filter, noise suppression (with level), AGC, and volume
- Start the pipeline
FlexMux Constraint
When using UDP, multicast, or TCP output with audio, the video codec must be set to AV1. The web interface will show a validation error if this constraint is not met.
Stream Audio
When using "Stream" as the audio source, your input must contain an audio track. If no audio is detected, the pipeline will start without audio and log a warning.
For information on receiving and playing back streams with audio, including required plugins, see Receiving & Playback.