Skip to content

Audio Encoding

Flex Video supports optional audio encoding alongside video streams using Codec2, a low-bitrate vocal codec designed specifically for voice communication.

About Codec2

Codec2 is an open-source speech codec that operates at extremely low bitrates (1,200 to 3,200 bits per second). It is optimized for voice and speech fidelity, not music or general audio. This makes it ideal for tactical voice communication over bandwidth-constrained networks.

Mode Bitrate Use Case
3200 3.2 kbps Highest voice quality
2400 2.4 kbps Default — good balance of quality and bandwidth
1600 1.6 kbps Reduced quality, lower bandwidth
1400 1.4 kbps Low bandwidth environments
1300 1.3 kbps Very low bandwidth
1200 1.2 kbps Minimum bandwidth

Audio Sources

Flex Video supports three audio source types:

Stream Extraction

Extract audio from the incoming video stream (e.g., an MPEG-TS stream with embedded audio). This is the default source type.

  • Requires the input stream to contain an audio track
  • Not available when using a test video source

Local Device

Capture audio from a local ALSA device such as a USB microphone.

  • Use the web interface device picker or query GET /flex/audio-devices to list available capture devices
  • Device paths use the ALSA format (e.g., dsnoop:0,0, plughw:0,0, hw:1,0, default)

Test Tone

Generate a synthetic audio signal for testing purposes.

  • Wave types include sine, square, sawtooth, triangle, and silence (0–12)
  • Audio processing filters are automatically skipped for test sources

Audio Processing

Audio passes through a configurable processing chain before encoding. The processing order matters — each stage builds on the previous one:

1. High-Pass Noise Filter

A 200 Hz high-pass filter that removes wind noise and low-frequency rumble. Enabled by default.

  • Cuts frequencies below 200 Hz
  • Effective for outdoor environments with wind or machinery noise
  • Skipped for test sources

2. Noise Suppression

ML-based noise suppression that removes background noise while preserving voice clarity. Enabled by default.

Suppression levels:

Level Aggressiveness Best For
1 Low Quiet environments with minimal noise
2 Moderate Default — general purpose
3 High Noisy environments
4 Aggressive Very noisy environments (may affect voice quality)
  • Skipped for test sources

3. Automatic Gain Control (AGC)

Automatic gain control with a built-in limiter. Normalizes audio levels so quiet voices are amplified and loud sounds are capped. Enabled by default.

  • Skipped for test sources

4. Volume Adjustment

Optional manual volume control applied after all other processing.

  • Range: 0.0 (mute) to 2.0 (double volume)
  • 1.0 = normal level
  • When not set, no volume adjustment is applied

Output Behavior

Audio output depends on the transport type:

RTSP Output

Audio is included as an additional stream alongside the video in the RTSP output. Receivers that support Codec2 can decode it; others will see only the video stream. Works with any video codec (H.264, H.265, AV1).

UDP / Multicast / TCP Output

Audio is multiplexed with video using the FlexMux container format. This requires AV1 as the video codec — the pipeline validator will reject configurations that combine audio with H.264 or H.265 over UDP/multicast/TCP output.

Configuring Audio in the Web Interface

  1. Open the Add Pipeline or Edit Pipeline page
  2. Expand the Audio section
  3. Check Enable Audio Encoding
  4. Select your audio source (Stream, Local, or Test)
  5. For Local source, select the ALSA device from the dropdown
  6. For Test source, select a wave type
  7. Choose a Codec2 mode (default: 2400)
  8. Adjust processing options as needed:
    • Noise filter, noise suppression (with level), AGC, and volume
  9. Start the pipeline

FlexMux Constraint

When using UDP, multicast, or TCP output with audio, the video codec must be set to AV1. The web interface will show a validation error if this constraint is not met.

Stream Audio

When using "Stream" as the audio source, your input must contain an audio track. If no audio is detected, the pipeline will start without audio and log a warning.

For information on receiving and playing back streams with audio, including required plugins, see Receiving & Playback.