Capturing Desktop/Speaker

see also

ImageGrab Module (macOS and Windows only) (Python Pillow examples)

If you are a Unix user or a user of MacOS (darwin or later), see official wiki.

Also even if you are a Windows user, official wiki is useful for you but…, it’s pretty under-explanatory. Although many that is written on the wiki needs to be prepared for anything other than ffmpeg, that has not been explained at all.

First of all, “UScreenCapture” is a third-party DirectShow driver that you have to download and install.

Second, “Stereo Mix”, in which only keywords are mentioned, requires Windows system configuration. You can know about this by this site.

Once you’ve done this, if you use the DirectShow approach and if you want to use “Stereo Mix”, remember to activate the corresponding speaker. For example, if you have a PC connected to a TV with an HDMI cable (like me), the TV’s audio playback device-driver will be activated. “Stereo Mix” is, in short, “bypass”. If you are familiar with Unix commands, think of an analogy with “tee” command:

in the case of my “Realtek High Definition Audio”
                   +-------------------------------+
[audio input] ---> | Realtek High Definition Audio |
                   |                     [Speaker] | ---> [actual speaker device]
                   |                [Stereo Mixer] | ---> [ffmpeg, etc.]
                   +-------------------------------+

In my case:

[me@host: ~]$ ffmpeg -hide_banner -list_devices true -f dshow -i dummy 2>&1
[dshow @ 00000000005b17a0] DirectShow video devices (some may be both video and audio devices)
[dshow @ 00000000005b17a0]  "Chicony USB 2.0 Camera"
[dshow @ 00000000005b17a0]     Alternative name "@device_pnp_\\?\usb#vid_04f2&pid_b43b&mi_00#7&8618e4c&0&0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\global"
[dshow @ 00000000005b17a0]  "UScreenCapture"
[dshow @ 00000000005b17a0]     Alternative name "@device_sw_{860BB310-5D01-11D0-BD3B-00A0C911CE86}\UScreenCapture"
[dshow @ 00000000005b17a0] DirectShow audio devices
[dshow @ 00000000005b17a0]  "ステレオ ミキサー (Realtek High Definit"
[dshow @ 00000000005b17a0]     Alternative name "@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\ステレオ ミキサー (Realtek High Definit"
[dshow @ 00000000005b17a0]  "マイク (Realtek High Definition Au"
[dshow @ 00000000005b17a0]     Alternative name "@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\マイク (Realtek High Definition Au"
dummy: Immediate exit requested

, the following script can capture the entire desktop screen and speaker output:

in the case of my “Realtek High Definition Audio”
#! /bin/sh
ffmpeg -y \
    -rtbufsize 500M \
    -thread_queue_size 64 \
    -f dshow -i video='UScreenCapture' \
    -rtbufsize 500M \
    -thread_queue_size 64 \
    -f dshow -i audio='ステレオ ミキサー (Realtek High Definit' \
    recorded.mp4
# "ステレオ ミキサー" means "Stereo Mixer".

Or you can use “Alternative name”:

in the case of my “Realtek High Definition Audio”
#! /bin/sh
ffmpeg -y \
    -rtbufsize 500M \
    -thread_queue_size 64 \
    -f dshow -i video='UScreenCapture' \
    -rtbufsize 500M \
    -thread_queue_size 64 \
    -f dshow -i audio='@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\ステレオ ミキサー (Realtek High Definit' \
    recorded.mp4
# "ステレオ ミキサー" means "Stereo Mixer".

As you can see, unfortunately “localization” disturbs you. Because you must give ffmpeg the DirectShow filter name, on Japanese Windows you will have to specify its Japanese name. Moreover, you must pass the name in Shift-JIS (in the case of Japanese) even though it is reported in utf-8 with “-list_devices”.

If you do not need desktop screen capture and want only speaker output capture, of course you only have to specify MP3 (etc) as the output file, but it will be interesting to visualize the audio as follows:

in the case of my “Realtek High Definition Audio”
#! /bin/sh
# -*- coding: shift_jis -*-
# name: stereomixrecord.sh
#
ffmpeg -y \
    -rtbufsize 500M \
    -thread_queue_size 64 \
    -f dshow -i audio='ステレオ ミキサー (Realtek High Definit' \
    -filter_complex "
[0:a]showcqt=s=1280x720:fps=24,crop=1242:720:0:0,scale=1280:720,setsar=1[vcqt];
[0:a]showvolume=r=24,colorkey=black,scale=853:26,setsar=1[vv];
[0:a]showwaves=split_channels=1:r=24:mode=line:
colors=red@0.8|green@0.9:s=853x320,setsar=1[vs];
[vs][vv]overlay=x=0:y=(H-h)/2[v1];
[vcqt][v1]overlay=x=W-w-33:y=33
" \
    ${extra_outopt} \
    "${1:-recorded.mp4}"
[me@host: ~]$ # one way to test whether it works
[me@host: ~]$ extra_outopt="-f matroska" stereomixrecord.sh - | ffplay -
   ...
[me@host: ~]$ # capture and output to out.mp4
[me@host: ~]$ stereomixrecord.sh out.mp4
   ...
see also

`showcqt’, `showwaves’ and `showvolume’ (overlay)

For capturing the desktop, you can also use “gdigrab” like this:

in the case of my “2- Realtek(R) Audio” (Windows 10)
#! /bin/bash
# -*- coding: shift_jis -*-
"/c/Program Files/ffmpeg-4.1.1-win64-shared/bin/ffmpeg" -y \
    -f gdigrab -framerate 10 -i desktop  \
    -f dshow -i audio='ステレオ ミキサー (2- Realtek(R) Audio)' \
    -pix_fmt yuv420p \
    -vf crop=$((3840/2)):$((2160/2)):0:0 \
    ${extra_outopt} \
    "${1:-recorded.mp4}"