Use of audio visualization as a visual effect

Watch on youtube.com
doc

https://ffmpeg.org/ffmpeg-filters.html#showcqt, https://ffmpeg.org/ffmpeg-filters.html#alphamerge, https://ffmpeg.org/ffmpeg-filters.html#vflip, https://ffmpeg.org/ffmpeg-filters.html#scale, https://ffmpeg.org/ffmpeg-filters.html#geq, https://ffmpeg.org/ffmpeg-filters.html#colorkey, https://ffmpeg.org/ffmpeg-filters.html#overlay-1, https://ffmpeg.org/ffmpeg-filters.html#blend_002c-tblend

ffmpeg does not provide a way to use sound data itself directly as a visual effect. However, as there are several sound visualization methods, using visualization of them will make it a visual effect.

I will show you a few examples of that. Things that can withstand professional use are probably difficult to achieve with only ffmpeg, but if your needs are simple, it can be realized as in the example to show from now.

with `alphamerge’ (1)

00:00:45

The first example uses “alphamerge” for visualized video by “showsqt”.

#! /bin/sh
ifn="Air on the G String (from Orchestral Suite no. 3, BWV 1068).mp3"
ofn="out_01.mp4"
bgi1="beach-dawn-dusk-274053.jpg"
bgi2="art-backlit-beach-256807.jpg"
#
ffmpeg -y -i "${ifn}" -i "${bgi1}" -i "${bgi2}" -filter_complex "
[0:a]showcqt=s=1280x720:basefreq=27.5:endfreq=4186.0[alpha];
[1:v]scale=-1:720,pad='1280:720:(ow-iw)/2:(oh-ih)/2',loop=size=2:loop=-1[1v];
[2:v]scale=-1:720,pad='1280:720:(ow-iw)/2:(oh-ih)/2',loop=size=2:loop=-1[bg];
[1v][alpha]alphamerge[fg];
[bg][fg]overlay,setsar=1[v]
" -map '[v]' -map '0:a' -c:a copy -shortest "${ofn}"

with `alphamerge’ (2)

00:01:45

It is essentially the same as the first example. It visualizes each stereo channel with “showcqt” and “vflip” on the right channel. (We have a redundant “scale” due to a bug in “vflip.” If you omit this extra “scale”, ffmpeg will crash.)

#! /bin/sh
ifn="Air on the G String (from Orchestral Suite no. 3, BWV 1068).mp3"
ofn="out_02.mp4"
bgi1="beach-dawn-dusk-274053.jpg"
bgi2="art-backlit-beach-256807.jpg"
#
ffmpeg -y -i "${ifn}" -i "${bgi1}" -i "${bgi2}" -filter_complex "
[0:a]
pan=stereo|c0=c0|c1=c0,showcqt=s=1280x360:basefreq=27.5:endfreq=4186.0
[vcqt_L];
[0:a]
pan=stereo|c0=c1|c1=c1,showcqt=s=1280x360:basefreq=27.5:endfreq=4186.0
,scale=1280:720,vflip,scale=1280:360
[vcqt_R];
[vcqt_L][vcqt_R]vstack[alpha];

[1:v]scale=-1:720,pad='1280:720:(ow-iw)/2:(oh-ih)/2',loop=size=2:loop=-1[1v];
[2:v]scale=-1:720,pad='1280:720:(ow-iw)/2:(oh-ih)/2',loop=size=2:loop=-1[bg];
[1v][alpha]alphamerge[fg];
[bg][fg]overlay,setsar=1[v]
" -map '[v]' -map '0:a' -c:a copy -shortest "${ofn}"

with `alphamerge’ (3)

00:02:45

It is almost the same as the previous example, but here we use only one picture, not two pictures.

#! /bin/sh
ifn="Air on the G String (from Orchestral Suite no. 3, BWV 1068).mp3"
ofn="out_03.mp4"
bgi="beach-dawn-dusk-274053.jpg"
#
ffmpeg -y -i "${ifn}" -i "${bgi}" -filter_complex "
[0:a]showcqt=s=1280x720:basefreq=27.5:endfreq=4186.0,split[alpha][bgbase];
[1:v]scale=-1:720,pad='1280:720:(ow-iw)/2:(oh-ih)/2',loop=size=2:loop=-1[1v];
[1v][alpha]alphamerge[fg];
[bgbase]geq=r=0:g=0:b=0[bg];
[bg][fg]overlay,setsar=1[v]
" -map '[v]' -map '0:a' -c:a copy -shortest "${ofn}"

with `geq’ (1)

00:03:45

In this example, for the top layer it is transparent with “colorkey” and the bottom layer uses “geq” to retrieve only the specific Y (= 355, slightly above the axis) of the “showcqt” result.

#! /bin/sh
ifn="Air on the G String (from Orchestral Suite no. 3, BWV 1068).mp3"
ofn="out_04.mp4"
bgi="art-backlit-beach-256807.jpg"
#
ffmpeg -y -i "${ifn}" -i "${bgi}" -filter_complex "
[0:a]
showcqt=s=1280x720:basefreq=27.5:endfreq=4186.0
,geq='p(X,355)'
[vcqt];
[1:v]
scale=-1:720,pad='1280:720:(ow-iw)/2:(oh-ih)/2',loop=size=2:loop=-1
,colorkey=color=black:similarity=0.1
[1v];

[vcqt][1v]overlay,setsar=1[v]
" -map '[v]' -map '0:a' -c:a copy -shortest "${ofn}"

with `geq’ (2)

00:04:45

It is essentially the same as the previous example. We use the average of three points of “showcqt” result.

#! /bin/sh
ifn="Air on the G String (from Orchestral Suite no. 3, BWV 1068).mp3"
ofn="out_05.mp4"
bgi="art-backlit-beach-256807.jpg"
#
ffmpeg -y -i "${ifn}" -i "${bgi}" -filter_complex "
[0:a]
showcqt=s=1280x720:basefreq=27.5:endfreq=4186.0
,geq='p(240,355)/3 + p(400,355)/3 + p(560,355)/3'
[vcqt];
[1:v]
scale=-1:720,pad='1280:720:(ow-iw)/2:(oh-ih)/2',loop=size=2:loop=-1
,colorkey=color=black:similarity=0.1
[1v];

[vcqt][1v]overlay,setsar=1[v]
" -map '[v]' -map '0:a' -c:a copy -shortest "${ofn}"

with `geq’ (3)

00:05:45

It is similar to the previous example, but here we are using “blend” directly.

#! /bin/sh
ifn="Air on the G String (from Orchestral Suite no. 3, BWV 1068).mp3"
ofn="out_06.mp4"
bgi="art-backlit-beach-256807.jpg"
#
ffmpeg -y -i "${ifn}" -i "${bgi}" -filter_complex "
[0:a]
showcqt=s=1280x720:basefreq=27.5:endfreq=4186.0
,geq='p(240,355)/3 + p(400,355)/3 + p(560,355)/3'
,setsar=1
[vcqt];
[1:v]
scale=-1:720,pad='1280:720:(ow-iw)/2:(oh-ih)/2',loop=size=2:loop=-1
,setsar=1
[1v];

[vcqt][1v]blend=all_mode=average[v]
" -map '[v]' -map '0:a' -c:a copy -shortest "${ofn}"