Adding an album art / an embedded cover/thumbnail¶
If you want to do this, it’s probably because of the “icon view” in the media list, for example on Windows Explorer or iPod, etc. Or, for example, Windows Media Player can display album art when playing audio files.
It’s not difficult to do this, but the support from the tools may vary, so the result may be disappointing. I have only Windows 7, so I will explain this using with this as an example.
For MP3¶
In the case of MP3, you can do it like this:
[me@host: Videos]$ ffmpeg -y -i orig.mp3 -i bg1.jpg \
> -map 0:a \
> -map 1:v -metadata:s:v comment="Cover (front)" \
> -id3v2_version 3 \
> -c:a copy \
> out1.mp3
[me@host: Videos]$ ffmpeg -y -i orig.mp3 -i bg1.jpg -i bg2.jpg \
> -map 0:a \
> -map 1:v -metadata:s:v:0 comment="Cover (front)" \
> -map 2:v -metadata:s:v:1 comment="Cover (back)" \
> -id3v2_version 3 \
> -c:a copy \
> out2.mp3
The stream metadata tag comment (specified by
-metadata:s
in the above example) map to APIC picture type. (See “4.14 APIC Attached picture” of id3v2.4.0-frames.txt.)Ideally, the
-id3v2_version
option is not required, but it is required for “legacy” tools as described at ffmpeg-all.html#mp3. (Especially this is Windows Explorer of Windows 7.)It seems to depend on the tool which image is displayed when embedding multiple images as shown in the example.
For ffplay you can switch by pressing “v”.
Windows Media Player and Windows Explorer (the preview pane) seem to display picture types “Cover (front)” or “other”.
Media Player Classic Home Cinema seem to display picture types “Cover (back)”.
Information about the embedded image can be obtained by ffprobe:
[me@host: Videos]$ ffprobe -hide_banner orig.mp3
[mp3 @ 0000000000346d00] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from 'orig.mp3':
Metadata:
title : Blizzards
artist : Riot
album : YouTube Audio Library
genre : Country & Folk
encoder : Google
Duration: 00:02:06.56, start: 0.000000, bitrate: 320 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
[me@host: Videos]$ ffprobe -hide_banner out1.mp3
Input #0, mp3, from 'out1.mp3':
Metadata:
title : Blizzards
artist : Riot
album : YouTube Audio Library
genre : Country & Folk
encoder : Lavf57.71.100
Duration: 00:02:06.56, start: 0.011995, bitrate: 1865 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Metadata:
encoder : Lavf
Stream #0:1: Video: png, rgb24(pc), 4000x2250 [SAR 1:1 DAR 16:9], 90k tbr, 90k tbn, 90k tbc
Metadata:
comment : Cover (front)
[me@host: Videos]$ ffprobe -hide_banner out2.mp3
Input #0, mp3, from 'out2.mp3':
Metadata:
title : Blizzards
artist : Riot
album : YouTube Audio Library
genre : Country & Folk
encoder : Lavf57.71.100
Duration: 00:02:06.56, start: 0.011995, bitrate: 2308 kb/s
Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Metadata:
encoder : Lavf
Stream #0:1: Video: png, rgb24(pc), 2200x1464 [SAR 1:1 DAR 275:183], 90k tbr, 90k tbn, 90k tbc
Metadata:
comment : Cover (back)
Stream #0:2: Video: png, rgb24(pc), 4000x2250 [SAR 1:1 DAR 16:9], 90k tbr, 90k tbn, 90k tbc
Metadata:
comment : Cover (front)
If you use other tools, such as mutagen-inspect:
[me@host: Videos]$ mutagen-inspect out1.mp3
-- out1.mp3
- MPEG 1 layer 3, 320000 bps (CBR), 44100 Hz, 2 chn, 126.56 seconds (audio/mp3)
APIC=cover front, (image/png, 24447841 bytes)
TALB=YouTube Audio Library
TCON=Country & Folk
TIT2=Blizzards
TPE1=Riot
TSSE=Lavf57.71.100
[me@host: Videos]$ mutagen-inspect out2.mp3
-- out2.mp3
- MPEG 1 layer 3, 320000 bps (CBR), 44100 Hz, 2 chn, 126.56 seconds (audio/mp3)
APIC=cover back, (image/png, 7002791 bytes)
APIC=cover front, (image/png, 24447841 bytes)
TALB=YouTube Audio Library
TCON=Country & Folk
TIT2=Blizzards
TPE1=Riot
TSSE=Lavf57.71.100
For MP4(, etc)¶
- doc
“-disposition” (Advanced-options).
In the case of movies such as MP4, you can do it with -disposition
like this:
[me@host: Videos]$ "/c/Program Files/ffmpeg-4.1-win64-shared/bin/ffmpeg" -y \
> -i "Pexels Videos 4105.mp4" -i bg1.jpg \
> -map 0:v -c:v:0 copy \
> -map 1:v -disposition:v:1 attached_pic -c:v:1 png \
> out3.mp4
[me@host: Videos]$ ffprobe -hide_banner out3.mp4
[mov,mp4,m4a,3gp,3g2,mj2 @ 000000000032afe0] stream 0, timescale not set
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'out3.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.20.100
Duration: 00:00:21.00, start: 0.000000, bitrate: 13223 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 3906 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
Metadata:
handler_name : L-SMASH Video Handler
Stream #0:1: Video: png, rgb24(pc), 4000x2250 [SAR 1:1 DAR 16:9], 90k tbr, 90k tbn, 90k tbc
Images embedded in this way can be viewed in the Explorer’s preview pane on Windows.
Here is an example of Python script to add a special embeded thumbnail image to your video:
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import io
import os
import sys
import re
import subprocess
import tempfile
import shutil
import logging
_log = logging.getLogger()
if hasattr("", "decode"):
_encode = lambda s: s.encode(sys.getfilesystemencoding())
else:
_encode = lambda s: s
def _filter_args(*cmd):
"""
do filtering None, and do encoding items to bytes
(in Python 2).
"""
return list(map(_encode, filter(None, *cmd)))
def check_call(*popenargs, **kwargs):
"""
Basically do simply forward args to subprocess#check_call, but this
does two things:
* It does encoding these to bytes in Python 2.
* It does omitting `None` in *cmd.
"""
cmd = kwargs.get("args")
if cmd is None:
cmd = popenargs[0]
subprocess.check_call(_filter_args(cmd), **kwargs)
def parse_time(s):
"""
>>> print("%.3f" % parse_time(3.2))
3.200
>>> print("%.3f" % parse_time(3))
3.000
>>> print("%.3f" % parse_time("00:00:01"))
1.000
>>> print("%.3f" % parse_time("00:00:01.3"))
1.300
>>> print("%.3f" % parse_time("00:00:01.34"))
1.340
>>> print("%.3f" % parse_time("00:00:01.034"))
1.034
>>> print("%.3f" % parse_time("00:00:01.345"))
1.345
>>> print("%.3f" % parse_time("00:01:01.345"))
61.345
>>> print("%.3f" % parse_time("02:01:01.345"))
7261.345
>>> print("%.3f" % parse_time("01:01.345"))
61.345
"""
try:
return float(s)
except ValueError:
if "." in s:
n, _, ss = s.rpartition(".")
else:
n, ss = s, "0"
n = n.split(":")
if len(n) > 3:
raise ValueError("'{}' is not valid time.".format(s))
result = sum([
p * 60**(len(n) - 1 - i)
for i, p in enumerate(list(map(int, n)))])
result += int(ss) / float((10**len(ss)))
return result
def _ts_to_tss(ts, frac=3):
d, _, f = (("%%.%df" % frac) % ts).partition(".")
d = abs(int(d))
ss_h = int(d / 3600)
d -= ss_h * 3600
ss_m = int(d / 60)
d -= ss_m * 60
ss_s = int(d)
return "%s%02d:%02d:%02d.%s" % (
"" if ts >= 0 else "-",
ss_h, ss_m, ss_s, f)
def _get_videoinfo(video):
cmdl = [
"ffprobe",
"-hide_banner",
#"-show_streams",
video,
]
stderrtxtoutfn = tempfile.mktemp()
with io.open(stderrtxtoutfn, "wb") as err:
#raw = subprocess.check_output(
# list(map(_encode, filter(None, cmdl))), stderr=subprocess.DEVNULL)
subprocess.check_output(list(map(_encode, filter(None, cmdl))), stderr=err)
stderrout = io.open(stderrtxtoutfn, "rb").read()
dur = re.search(rb" Duration: ([\d:.]+),", stderrout).group(1).decode()
geom = re.search(
rb"Stream .*: Video: .*, (\d+x\d+).*\(default\)", stderrout).group(1).decode()
atpi = re.search(rb"Stream .*: Video: .*,.*\(attached pic\)", stderrout)
os.remove(stderrtxtoutfn)
return parse_time(dur), tuple(map(int, geom.split("x"))), atpi
def _mk_thumbsimage(video, shape, **ffkwarg):
vffmt = "scale={}:{},setsar=1,drawtext='fontsize=24:fontcolor=white:text={}':x=10:y=10"
dur, geom, _ = _get_videoinfo(video)
timgs = []
for i, ss in enumerate(
[min(dur / (shape[0] * shape[1]) * i + 1.0, dur)
for i in range((shape[0] * shape[1]))]):
ofn = (tempfile.mktemp() + ".jpg").replace("\\", "/")
cmdl = [
"ffmpeg", "-hide_banner", "-y",
"-ss", "{}".format(ss),
"-i", video,
"-vf",
vffmt.format(
geom[0]//shape[0], geom[1]//shape[1],
_ts_to_tss(ss, 1).replace(":", "\\:")),
"-r", "1/1", "-t", "1",
ofn
]
check_call(cmdl, **ffkwarg)
timgs.append(ofn)
cmdl = [
"ffmpeg", "-hide_banner", "-y",
]
for img in timgs:
cmdl.extend(["-i", img])
ofn = video + ".jpg"
#
tmp = []
for row in range(shape[1]):
tmp.append(
"".join([
"[{}:v]".format(row * shape[0] + col)
for col in range(shape[0])]) + "hstack={}[v{}]".format(
shape[0], row))
vf = ";".join(tmp) + ";" + "".join([
"[v{}]".format(row)
for row in range(shape[1])]) + "vstack={}".format(shape[1])
cmdl.extend([
"-filter_complex",
vf,
"-an",
ofn
])
try:
check_call(cmdl, **ffkwarg)
finally:
[os.remove(fn) for fn in timgs]
return ofn
def _add_embeded_coverimg_to_video(video, thmbimgfn):
tmpdir = tempfile.mkdtemp()
bn = os.path.basename(video)
ifn = os.path.join(tmpdir, bn)
ofntmp = os.path.join(tmpdir, "_tmpout_" + bn)
shutil.copyfile(video, ifn)
# NOTE: 1. you needs ffmpeg 4.0 or newer for "attached_pic"
# 2. "attached_pic" for matroska is maybe invalid
cmdl = [
"ffmpeg", "-hide_banner", "-y", "-i", ifn,
"-i", thmbimgfn,
"-map", "0:v",
"-map", "0:a",
"-map", "1:v",
"-c", "copy",
"-disposition:v:1", "attached_pic",
ofntmp]
check_call(cmdl, **ffkwarg)
# check result
_, _, atpi = _get_videoinfo(ofntmp)
if atpi:
os.remove(video)
shutil.move(ofntmp, video)
os.remove(thmbimgfn)
else:
# There is no concept of "attached_pic" in mkv video, but ffmpeg
# ignores that fact and embeds it as one of the video streams.
# Unfortunately, almost all video players can't handle this correctly.
# As far as I know, only VLC MediaPlayer and ffplay behaves nicely,
# but other behaviors vary. Especially the case where duration is treated
# as zero is the most unpleasant.
# On the other hand, if the output container is "WEBM", ffmpeg
# rejects the still image embedding itself. That is, the script dies
# with an exception before reaching this point.
os.remove(ofntmp)
_log.error(
"'%s' does not seem to support 'attached_pic'",
os.path.splitext(video)[1])
os.remove(ifn)
os.rmdir(tmpdir)
if __name__ == '__main__':
logging.basicConfig(stream=sys.stderr, level=logging.INFO)
import argparse
ap = argparse.ArgumentParser()
ap.add_argument("video", nargs="+")
ap.add_argument("--shape", default="[5, 4]")
ap.add_argument("--quiet", action="store_true")
ap.add_argument("--add_embeded_coverimg_to_video", action="store_true")
args = ap.parse_args()
ffkwarg = {}
if re.match(r"[\[(]?\d+,\d+[)\]]?", re.sub(r"\s*", "", args.shape)):
shape = eval(args.shape)
else:
ap.error("invalid shape: {}".format(args.shape))
if args.quiet:
ffkwarg.update(dict(stderr=subprocess.DEVNULL))
for video in args.video:
thmbimgfn = _mk_thumbsimage(video, shape, **ffkwarg)
if args.add_embeded_coverimg_to_video:
_add_embeded_coverimg_to_video(video, thmbimgfn)