channelsplit, join, amix, amerge

Watch on youtube.com
doc

https://ffmpeg.org/ffmpeg-filters.html#channelsplit, https://ffmpeg.org/ffmpeg-filters.html#join, https://ffmpeg.org/ffmpeg-filters.html#amix, https://ffmpeg.org/ffmpeg-filters.html#amerge-1, (ffmpeg-utils)2.8. Syntax | Channel Layout, https://ffmpeg.org/ffmpeg-filters.html#aformat-1

see also

pan, channelmap, atempo, asetrate, aresample, aformat

streams and channels

Before understanding the individual filters, you should first recall the distinction between streams and channels.

First, containers don’t just store 1 + 1 “video and audio”. For example, the following creates a video with so-called “multiplex audio”:

There are certain rules, so care must be taken depending on the recipient. For example, the following is completely unrelated to the multiplexed audio that can be controlled by a DVD player.
[me@host: ~]$ ffmpeg -y \
> -i video.wmv \
> -i japanese.wma \
> -i englishdub.wma \
> -map '0:v' -map '1:a' -map '2:a' -shortest out.mp4
[me@host: ~]$ ffprobe -hide_banner out.mp4
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'out.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf57.71.100
  Duration: 00:00:32.24, start: 0.000000, bitrate: 578 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 323 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 132 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
    Stream #0:2(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 130 kb/s
    Metadata:
      handler_name    : SoundHandler

Most media players recognize this, and you can switch the stream to watch (or listen) by some operation. In the case of ffplay, you can switch between audio streams by typing a, video streams by typing v, and subtitle streams by typing t.

Note

ffplay’s help describes these as “switching channel” so don’t get confused. This operation is actually “stream switching”.

On the other hand, the “audio channel” is “front left / front right / …” for each stream. While the target of pan, channelmap, etc, is this, the target of channelsplit, join, amix and amerge is a stream.

channelsplit

doc

https://ffmpeg.org/ffmpeg-filters.html#channelsplit, (ffmpeg-utils)2.8. Syntax | Channel Layout

The description in the official documentation:

Split each channel from an input audio stream into a separate output stream.

It accepts the following parameters:

  • channel_layout

    The channel layout of the input stream. The default is “stereo”.

  • channels

    A channel layout describing the channels to be extracted as separate output streams or “all” to extract each input channel as a separate stream. The default is “all”.

Choosing channels not present in channel layout in the input will result in an error.

All examples in the official documentation are “into files” examples:

For example, assuming a stereo input MP3 file,

[me@host: ~]$ ffmpeg -i in.mp3 \
> -filter_complex channelsplit out.mkv

will create an output Matroska file with two audio streams, one containing only the left channel and the other the right channel.

Split a 5.1 WAV file into per-channel files:

[me@host: ~]$ ffmpeg -i in.wav -filter_complex "
> channelsplit=channel_layout=5.1[FL][FR][FC][LFE][SL][SR]" \
> -map '[FL]' front_left.wav \
> -map '[FR]' front_right.wav \
> -map '[FC]' front_center.wav \
> -map '[LFE]' lfe.wav \
> -map '[SL]' side_left.wav \
> -map '[SR]' side_right.wav

Extract only LFE from a 5.1 WAV file:

[me@host: ~]$ ffmpeg -i in.wav -filter_complex "
> channelsplit=channel_layout=5.1:channels=LFE[LFE]" \
> -map '[LFE]' lfe.wav

, but perhaps the most commonly used pattern is to apply a different filter to each channel in your filter graph:

[me@host: ~]$ # assuming that sampling rate of some.wav is 48000Hz
[me@host: ~]$ ffplay -f lavfi "
> amovie=some.wav,channelsplit[Lu][R];
> [Lu]asetrate='48000/1.0089',atempo=1.0089,aresample=48000[L];
> [L][R]join"
[me@host: ~]$ ffplay -f lavfi "
> amovie=some.wav
> ,asplit[out1][a];
> [a]channelsplit[a1][a2];
> [a1]showcqt=s=960x270,setsar=1[v1];
> [a2]showcqt=s=960x270,setsar=1,vflip[v2];
> [v1][v2]vstack[out0]"

amerge

doc

https://ffmpeg.org/ffmpeg-filters.html#amerge-1, (ffmpeg-utils)2.8. Syntax | Channel Layout

The description in the official documentation:

Merge two or more audio streams into a single multi-channel stream.

I don’t think the naming “merge” and the detailed description in the official documentation clearly shows what this filter actually does. In short, the filter distributes channels from multiple streams into a single stream. That’s all. (Although the short description at the beginning is sufficiently clear, the detailed description is redundant and difficult to understand.)

A simple illustration, for example:

+- stream 1 -+
| ch1: [FL]  |                   +- out stream -+
| ch2: [FR]  | ---+              | ch1: [FL]    |
| ch3: [FC]  |    |              | ch2: [FR]    |
+------------+    +- [amerge] -> | ch3: [FC]    |
+- stream 2 -+    |              | ch4: [LFE]   |
| ch1: [LFE] |    |              | ch5: [SL]    |
| ch2: [SL]  | ---+              | ch6: [SR]    |
| ch3: [SR]  |                   +--------------+
+------------+

While this filter is very easy to use:

[me@host: ~]$ ffplay -f lavfi "
> amovie=audio1.wav[a1];
> amovie=audio2.wav[a2];
> amovie=audio3.wav[a3];
> [a1][a2][a3]amerge=3"

, this is all it can do. In particular, with this filter, you have no control over the output channel layout. As in the above illustration, the input only determines the output.

So in most cases you will chain other filters after this, such as pan or channelmap.

join

doc

https://ffmpeg.org/ffmpeg-filters.html#join, (ffmpeg-utils)2.8. Syntax | Channel Layout

The description in the official documentation:

Join multiple input streams into one multi-channel stream.

What the join filter does is essentially exactly the same as amerge. Unlike amerge, this allows you to control the output channel layout.

It accepts the following parameters:

  • inputs

    The number of input streams. It defaults to 2.

  • channel_layout

    The desired output channel layout. It defaults to stereo.

  • map

    Map channels from inputs to output. The argument is a ‘|’-separated list of mappings, each in the input_idx.in_channel-out_channel form. input_idx is the 0-based index of the input stream. in_channel can be either the name of the input channel (e.g. FL for front left) or its index in the specified input stream. out_channel is the name of the output channel.

The specification of “map” is at least “difficult to read”.

[me@host: ~]$ ffmpeg \
> -i fl.wav \
> -i fr.wav \
> -i fc.wav \
> -i sl.wav \
> -i sr.wav \
> -i lfe.wav \
> -filter_complex "
> join=inputs=6
> :channel_layout=5.1
> :map='0.0-FL|1.0-FR|2.0-FC|3.0-SL|4.0-SR|5.0-LFE' out.wav

For a first-timer it will only look like a spell, it’s hard to read, even for those who are used to writing.

The hyphen is the same as channelmap and is a separator between input and output. The period is, of course, not a “floating point“‘s period , not a part of “5.1ch (etc.)”, but is a separator that separates “stream index” from “channel (index) within that stream” like a hyphen.

So, the meaning of “2.0-FC” in the above example is

“Input channel 0 of stream 2 (=”fc.wav” in this case) and output this as FC”

Here is a slightly more simple example:

[me@host: ~]$ ffplay -f lavfi "
> sine=440:d=10,aphaser[a0];
> sine=220:d=10,tremolo=f=2[a1];
> sine=880:d=10,tremolo=f=4[a2];
> [a0][a1][a2]
> join=3:channel_layout=2.1
> :map='0.0 -FR| 1.0 -LFE| 2.0 -FL'
> ,asplit[out1],showvolume,scale=1000:-1[out0]"
[me@host: ~]$ ffplay -f lavfi "
> aevalsrc='0.3*sin(440*2*PI*t)|0.3*sin(880*2*PI*t)':d=10,aphaser[a0];
> sine=220:d=10,tremolo=f=2[a1];
> [a0][a1]
> join=2:channel_layout=2.1
> :map='0.0 -FL| 0.1 -FR| 1.0 -LFE'
> ,asplit[out1],showvolume,scale=1000:-1[out0]"

Note

As with the notes for pan, be careful when inserting spaces for readability. The following are rejected:

[me@host: ~]$ # NG!
[me@host: ~]$ ffplay -f lavfi "
> sine=440:d=10,aphaser[a0];
> sine=220:d=10,tremolo=f=2[a1];
> sine=880:d=10,tremolo=f=4[a2];
> [a0][a1][a2]
> join=3:channel_layout=2.1
> :map='0.0 - FR | 1.0 - LFE | 2.0 - FL '
> ,asplit[out1],showvolume,scale=1000:-1[out0]"

amix

doc

https://ffmpeg.org/ffmpeg-filters.html#amix, (ffmpeg-utils)2.8. Syntax | Channel Layout

The description in the official documentation:

Mixes multiple audio inputs into a single output.

Note that this filter only supports float samples (the amerge and pan audio filters support many formats). If the amix input has integer samples then aresample will be automatically inserted to perform the conversion to float samples.

What can be done with this filter can be realized almost with “amerge (etc) + pan”. The following two are “almost” the same.

[me@host: ~]$ ffplay -f lavfi "
> amovie=wolframtones_02.wav[a1];
> amovie=wolframtones_03.wav[a2];
> [a1][a2]amerge
> , pan=stereo| c0 < 0.2*c0 + 0.8*c2
>             | c1 < 0.2*c1 + 0.8*c3"
[me@host: ~]$ # weights option is not available on ffmpeg 3.x
[me@host: ~]$ "/c/Program Files/ffmpeg-4.1-win64-shared/bin/ffmpeg" \
> -filter_complex "
> amovie=wolframtones_02.wav[a1];
> amovie=wolframtones_03.wav[a2];
> [a1][a2]amix=weights='0.2 0.8'" -f wav - | ffplay -

If you want to weight, and you don’t have ffmpeg 4.x, you have no choice but to use pan, but if not, there are some functions (parameters) that are only available in amix, use them on a case-by-case basis.

Of course, if you just want to mix the simplest, amix is overwhelmingly easy:

[me@host: ~]$ ffmpeg -y -i 2ch_1.wav -i 2ch_2.wav \
> -filter_complex amix mixed.wav

And you can’t do the same as the folloing with amix:

[me@host: ~]$ ffplay -f lavfi "
> amovie=wolframtones_02.wav[a1];
> amovie=wolframtones_03.wav[a2];
> [a1][a2]amerge
> , pan=stereo| c0 < 0.2*c0 + 0.8*c2
>             | c1 < 0.8*c1 + 0.2*c3"