Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_merge_mpd_periods merges fragments from formats in the same period if the formats are functionally identical #10200

Closed
10 tasks done
auoie opened this issue Jun 17, 2024 · 1 comment
Labels
bug Bug that is not site-specific DRM The referred content is DRM protected

Comments

@auoie
Copy link

auoie commented Jun 17, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Provide a description that is worded well enough to be understood

If an MPEG-DASH file has a period with multiple "functionally identical"[1] tracks where each of those tracks consists of multiple fragments, then yt-dlp will concatenate these identical tracks into a single track.

For example, see Puyodead1/udemy-downloader#215. I have an .mpd file with a single period. It contains 6 identical audio tracks consisting of many fragments. It contains 6 differing resolution video tracks consisting of many fragments.

  • ffmpeg -i input.mpd shows 6 audio tracks and 6 video tracks. I'm able to download a specific audio track using ffmpeg -i index.mpd -map 0:p:0:6 -c output.m4a. The file is 1.05M.
  • Running yt-dlp --allow-unplayable-formats --enable-file-urls -F file://$(pwd)/index.mpd shows 6 video tracks and 1 audio track. There being only 1 audio track seems reasonable, but when I download the track with yt-dlp --allow-unplayable-formats --enable-file-urls -f 7 file://$(pwd)/index.mpd, it downloads a 6.78 MB file. It's concatenating the fragments from all 6 audio tracks. It would expect it to only download a single copy, not all 6 copies.

This was introduced in 4ce57d3. Using a commit before that, yt-dlp shows 6 video tracks and 6 audio tracks. After that commit, if two formats have the same format_key, then yt-dlp concatenates the fragments from those formats. This makes sense for formats from separate periods, but doesn't make sense for formats from the same period.

A possible fix is preprocessing each period so that it only loops over formats mod format_key in that period. This preserves the current functionality while making sure that duplicate formats from the same period are not redownloaded.

  formats, subtitles = {}, {}
  for period in periods:
+     unique_formats: dict = {}
      for f in period['formats']:
+         format_key = tuple(v for k, v in f.items() if k not in
+             ('format_id', 'fragments', 'manifest_stream_number'))
+         if format_key not in unique_formats:
+             unique_formats[format_key] = f
+     for format_key, f in unique_formats.items():
          assert 'is_dash_periods' not in f, 'format already processed'
          f['is_dash_periods'] = True
-          format_key = tuple(v for k, v in f.items() if k not in (
-              ('format_id', 'fragments', 'manifest_stream_number')))
          if format_key not in formats:
              formats[format_key] = f
          elif 'fragments' in f:
  1. The term "functionally identical" is taken from the ISO/IEC 23009-1 spec publicly available at https://standards.iso.org/ittf/PubliclyAvailableStandards/c083314_ISO_IEC%2023009-1_2022(en).zip.

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', '--allow-unplayable-formats', '--enable-file-urls', '-f', '7', 'file://$(pwd)/index.mpd']
WARNING: You have asked for UNPLAYABLE formats to be listed/downloaded. This is a developer option intended for debugging. 
         If you experience any issues while using this option, DO NOT open a bug report
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp [12b248ce6] (source)
[debug] Lazy loading extractors is disabled
[debug] Git HEAD: 5dbac313a
[debug] Python 3.12.3 (CPython x86_64 64bit) - Linux-6.9.3-arch1-1-x86_64-with-glibc2.39 (OpenSSL 3.3.1 4 Jun 2024, glibc 2.39)
[debug] exe versions: ffmpeg 6.1.1 (setts), ffprobe 6.1.1
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.06.02, mutagen-1.47.0, requests-2.32.3, sqlite3-3.46.0, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1821 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: [email protected] from yt-dlp/yt-dlp
yt-dlp is up to date ([email protected] from yt-dlp/yt-dlp)
[generic] Extracting URL: file://$(pwd)/yt-dlp/index.mpd
[generic] index: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] index: Extracting information
[debug] Identified a DASH manifest
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[info] index: Downloading 1 format(s): 7
[debug] Invoking dashsegments downloader on "file://$(pwd)/yt-dlp/index.mpd"
[dashsegments] Total fragments: 144
[download] Destination: index [index].m4a
[download] 100% of    6.46MiB in 00:00:01 at 4.49MiB/s
@auoie auoie added bug Bug that is not site-specific triage Untriaged issue labels Jun 17, 2024
@bashonly
Copy link
Member

Please give an example that is not DRM protected and the issue can be re-opened

@bashonly bashonly closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2024
@bashonly bashonly added DRM The referred content is DRM protected and removed triage Untriaged issue labels Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that is not site-specific DRM The referred content is DRM protected
Projects
None yet
Development

No branches or pull requests

2 participants