Adding an album art / an embedded cover/thumbnail

If you want to do this, it’s probably because of the “icon view” in the media list, for example on Windows Explorer or iPod, etc. Or, for example, Windows Media Player can display album art when playing audio files.

It’s not difficult to do this, but the support from the tools may vary, so the result may be disappointing. I have only Windows 7, so I will explain this using with this as an example.

For MP3

doc

ffmpeg-all.html#mp3

In the case of MP3, you can do it like this:

[me@host: Videos]$ ffmpeg -y -i orig.mp3 -i bg1.jpg \
> -map 0:a \
> -map 1:v -metadata:s:v comment="Cover (front)" \
> -id3v2_version 3 \
> -c:a copy \
> out1.mp3
[me@host: Videos]$ ffmpeg -y -i orig.mp3 -i bg1.jpg -i bg2.jpg \
> -map 0:a \
> -map 1:v -metadata:s:v:0 comment="Cover (front)" \
> -map 2:v -metadata:s:v:1 comment="Cover (back)" \
> -id3v2_version 3 \
> -c:a copy \
> out2.mp3
../_images/win_explorer_albumcover_preview.png
  • The stream metadata tag comment (specified by -metadata:s in the above example) map to APIC picture type. (See “4.14 APIC Attached picture” of id3v2.4.0-frames.txt.)

  • Ideally, the -id3v2_version option is not required, but it is required for “legacy” tools as described at ffmpeg-all.html#mp3. (Especially this is Windows Explorer of Windows 7.)

  • It seems to depend on the tool which image is displayed when embedding multiple images as shown in the example.

    • For ffplay you can switch by pressing “v”.

    • Windows Media Player and Windows Explorer (the preview pane) seem to display picture types “Cover (front)” or “other”.

    • Media Player Classic Home Cinema seem to display picture types “Cover (back)”.

Information about the embedded image can be obtained by ffprobe:

[me@host: Videos]$ ffprobe -hide_banner orig.mp3
[mp3 @ 0000000000346d00] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from 'orig.mp3':
  Metadata:
    title           : Blizzards
    artist          : Riot
    album           : YouTube Audio Library
    genre           : Country & Folk
    encoder         : Google
  Duration: 00:02:06.56, start: 0.000000, bitrate: 320 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
[me@host: Videos]$ ffprobe -hide_banner out1.mp3
Input #0, mp3, from 'out1.mp3':
  Metadata:
    title           : Blizzards
    artist          : Riot
    album           : YouTube Audio Library
    genre           : Country & Folk
    encoder         : Lavf57.71.100
  Duration: 00:02:06.56, start: 0.011995, bitrate: 1865 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
    Metadata:
      encoder         : Lavf
    Stream #0:1: Video: png, rgb24(pc), 4000x2250 [SAR 1:1 DAR 16:9], 90k tbr, 90k tbn, 90k tbc
    Metadata:
      comment         : Cover (front)
[me@host: Videos]$ ffprobe -hide_banner out2.mp3
Input #0, mp3, from 'out2.mp3':
  Metadata:
    title           : Blizzards
    artist          : Riot
    album           : YouTube Audio Library
    genre           : Country & Folk
    encoder         : Lavf57.71.100
  Duration: 00:02:06.56, start: 0.011995, bitrate: 2308 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
    Metadata:
      encoder         : Lavf
    Stream #0:1: Video: png, rgb24(pc), 2200x1464 [SAR 1:1 DAR 275:183], 90k tbr, 90k tbn, 90k tbc
    Metadata:
      comment         : Cover (back)
    Stream #0:2: Video: png, rgb24(pc), 4000x2250 [SAR 1:1 DAR 16:9], 90k tbr, 90k tbn, 90k tbc
    Metadata:
      comment         : Cover (front)

If you use other tools, such as mutagen-inspect:

[me@host: Videos]$ mutagen-inspect out1.mp3
-- out1.mp3
- MPEG 1 layer 3, 320000 bps (CBR), 44100 Hz, 2 chn, 126.56 seconds (audio/mp3)
APIC=cover front,  (image/png, 24447841 bytes)
TALB=YouTube Audio Library
TCON=Country & Folk
TIT2=Blizzards
TPE1=Riot
TSSE=Lavf57.71.100
[me@host: Videos]$ mutagen-inspect out2.mp3
-- out2.mp3
- MPEG 1 layer 3, 320000 bps (CBR), 44100 Hz, 2 chn, 126.56 seconds (audio/mp3)
APIC=cover back,   (image/png, 7002791 bytes)
APIC=cover front,  (image/png, 24447841 bytes)
TALB=YouTube Audio Library
TCON=Country & Folk
TIT2=Blizzards
TPE1=Riot
TSSE=Lavf57.71.100

For MP4(, etc)

doc

“-disposition” (Advanced-options).

In the case of movies such as MP4, you can do it with -disposition like this:

[me@host: Videos]$ "/c/Program Files/ffmpeg-4.1-win64-shared/bin/ffmpeg" -y \
> -i "Pexels Videos 4105.mp4" -i bg1.jpg \
> -map 0:v -c:v:0 copy \
> -map 1:v -disposition:v:1 attached_pic -c:v:1 png \
> out3.mp4
[me@host: Videos]$ ffprobe -hide_banner out3.mp4
[mov,mp4,m4a,3gp,3g2,mj2 @ 000000000032afe0] stream 0, timescale not set
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'out3.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 00:00:21.00, start: 0.000000, bitrate: 13223 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 3906 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
    Metadata:
      handler_name    : L-SMASH Video Handler
    Stream #0:1: Video: png, rgb24(pc), 4000x2250 [SAR 1:1 DAR 16:9], 90k tbr, 90k tbn, 90k tbc

Images embedded in this way can be viewed in the Explorer’s preview pane on Windows.

Here is an example of Python script to add a special embeded thumbnail image to your video:

add_embthumb_to_video.py
# -*- coding: utf-8 -*-
from __future__ import unicode_literals

import io
import os
import sys
import re
import subprocess
import tempfile
import shutil
import logging


_log = logging.getLogger()


if hasattr("", "decode"):
    _encode = lambda s: s.encode(sys.getfilesystemencoding())
else:
    _encode = lambda s: s


def _filter_args(*cmd):
    """
    do filtering None, and do encoding items to bytes
    (in Python 2).
    """
    return list(map(_encode, filter(None, *cmd)))


def check_call(*popenargs, **kwargs):
    """
    Basically do simply forward args to subprocess#check_call, but this
    does two things:
    * It does encoding these to bytes in Python 2.
    * It does omitting `None` in *cmd.

    """
    cmd = kwargs.get("args")
    if cmd is None:
        cmd = popenargs[0]
    subprocess.check_call(_filter_args(cmd), **kwargs)


def parse_time(s):
    """
    >>> print("%.3f" % parse_time(3.2))
    3.200
    >>> print("%.3f" % parse_time(3))
    3.000
    >>> print("%.3f" % parse_time("00:00:01"))
    1.000
    >>> print("%.3f" % parse_time("00:00:01.3"))
    1.300
    >>> print("%.3f" % parse_time("00:00:01.34"))
    1.340
    >>> print("%.3f" % parse_time("00:00:01.034"))
    1.034
    >>> print("%.3f" % parse_time("00:00:01.345"))
    1.345
    >>> print("%.3f" % parse_time("00:01:01.345"))
    61.345
    >>> print("%.3f" % parse_time("02:01:01.345"))
    7261.345
    >>> print("%.3f" % parse_time("01:01.345"))
    61.345
    """
    try:
        return float(s)
    except ValueError:
        if "." in s:
            n, _, ss = s.rpartition(".")
        else:
            n, ss = s, "0"
        n = n.split(":")
        if len(n) > 3:
            raise ValueError("'{}' is not valid time.".format(s))
        result = sum([
            p * 60**(len(n) - 1 - i)
            for i, p in enumerate(list(map(int, n)))])
        result += int(ss) / float((10**len(ss)))
        return result


def _ts_to_tss(ts, frac=3):
    d, _, f = (("%%.%df" % frac) % ts).partition(".")
    d = abs(int(d))
    ss_h = int(d / 3600)
    d -= ss_h * 3600
    ss_m = int(d / 60)
    d -= ss_m * 60
    ss_s = int(d)
    return "%s%02d:%02d:%02d.%s" % (
        "" if ts >= 0 else "-",
        ss_h, ss_m, ss_s, f)


def _get_videoinfo(video):
    cmdl = [
        "ffprobe",
        "-hide_banner",
        #"-show_streams",
        video,
    ]
    stderrtxtoutfn = tempfile.mktemp()
    with io.open(stderrtxtoutfn, "wb") as err:
        #raw = subprocess.check_output(
        #    list(map(_encode, filter(None, cmdl))), stderr=subprocess.DEVNULL)
        subprocess.check_output(list(map(_encode, filter(None, cmdl))), stderr=err)
    stderrout = io.open(stderrtxtoutfn, "rb").read()
    dur = re.search(rb" Duration: ([\d:.]+),", stderrout).group(1).decode()
    geom = re.search(
        rb"Stream .*: Video: .*, (\d+x\d+).*\(default\)", stderrout).group(1).decode()
    atpi = re.search(rb"Stream .*: Video: .*,.*\(attached pic\)", stderrout)
    os.remove(stderrtxtoutfn)
    return parse_time(dur), tuple(map(int, geom.split("x"))), atpi


def _mk_thumbsimage(video, shape, **ffkwarg):
    vffmt = "scale={}:{},setsar=1,drawtext='fontsize=24:fontcolor=white:text={}':x=10:y=10"
    dur, geom, _ = _get_videoinfo(video)
    timgs = []
    for i, ss in enumerate(
            [min(dur / (shape[0] * shape[1]) * i + 1.0, dur)
             for i in range((shape[0] * shape[1]))]):
        ofn = (tempfile.mktemp() + ".jpg").replace("\\", "/")
        cmdl = [
            "ffmpeg", "-hide_banner", "-y",
            "-ss", "{}".format(ss),
            "-i", video,
            "-vf",
            vffmt.format(
                geom[0]//shape[0], geom[1]//shape[1],
                _ts_to_tss(ss, 1).replace(":", "\\:")),
            "-r", "1/1", "-t", "1",
            ofn
        ]
        check_call(cmdl, **ffkwarg)
        timgs.append(ofn)
    cmdl = [
        "ffmpeg", "-hide_banner", "-y",
    ]
    for img in timgs:
        cmdl.extend(["-i", img])
    ofn = video + ".jpg"
    #
    tmp = []
    for row in range(shape[1]):
        tmp.append(
            "".join([
                "[{}:v]".format(row * shape[0] + col)
                for col in range(shape[0])]) + "hstack={}[v{}]".format(
                shape[0], row))
    vf = ";".join(tmp) + ";" + "".join([
        "[v{}]".format(row)
        for row in range(shape[1])]) + "vstack={}".format(shape[1])
    cmdl.extend([
        "-filter_complex",
        vf,
        "-an",
        ofn
    ])
    try:
        check_call(cmdl, **ffkwarg)
    finally:
        [os.remove(fn) for fn in timgs]
    return ofn


def _add_embeded_coverimg_to_video(video, thmbimgfn):
    tmpdir = tempfile.mkdtemp()
    bn = os.path.basename(video)
    ifn = os.path.join(tmpdir, bn)
    ofntmp = os.path.join(tmpdir, "_tmpout_" + bn)
    shutil.copyfile(video, ifn)
    # NOTE: 1. you needs ffmpeg 4.0 or newer for "attached_pic"
    #       2. "attached_pic" for matroska is maybe invalid
    cmdl = [
        "ffmpeg", "-hide_banner", "-y", "-i", ifn,
        "-i", thmbimgfn,
        "-map", "0:v",
        "-map", "0:a",
        "-map", "1:v",
        "-c", "copy",
        "-disposition:v:1", "attached_pic",
        ofntmp]
    check_call(cmdl, **ffkwarg)
    # check result
    _, _, atpi = _get_videoinfo(ofntmp)
    if atpi:
        os.remove(video)
        shutil.move(ofntmp, video)
        os.remove(thmbimgfn)
    else:
        # There is no concept of "attached_pic" in mkv video, but ffmpeg
        # ignores that fact and embeds it as one of the video streams.
        # Unfortunately, almost all video players can't handle this correctly.
        # As far as I know, only VLC MediaPlayer and ffplay behaves nicely,
        # but other behaviors vary. Especially the case where duration is treated
        # as zero is the most unpleasant.
        # On the other hand, if the output container is "WEBM", ffmpeg
        # rejects the still image embedding itself. That is, the script dies
        # with an exception before reaching this point.
        os.remove(ofntmp)
        _log.error(
            "'%s' does not seem to support 'attached_pic'",
            os.path.splitext(video)[1])
    os.remove(ifn)
    os.rmdir(tmpdir)


if __name__ == '__main__':
    logging.basicConfig(stream=sys.stderr, level=logging.INFO)
    import argparse

    ap = argparse.ArgumentParser()
    ap.add_argument("video", nargs="+")
    ap.add_argument("--shape", default="[5, 4]")
    ap.add_argument("--quiet", action="store_true")
    ap.add_argument("--add_embeded_coverimg_to_video", action="store_true")
    args = ap.parse_args()
    ffkwarg = {}
    if re.match(r"[\[(]?\d+,\d+[)\]]?", re.sub(r"\s*", "", args.shape)):
        shape = eval(args.shape)
    else:
        ap.error("invalid shape: {}".format(args.shape))
    if args.quiet:
        ffkwarg.update(dict(stderr=subprocess.DEVNULL))
    for video in args.video:
        thmbimgfn = _mk_thumbsimage(video, shape, **ffkwarg)
        if args.add_embeded_coverimg_to_video:
            _add_embeded_coverimg_to_video(video, thmbimgfn)