Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Extract web fragment URLs for livestreams #10148

Open
8 of 10 tasks
fren-archive opened this issue Jun 10, 2024 · 2 comments
Open
8 of 10 tasks

[YouTube] Extract web fragment URLs for livestreams #10148

fren-archive opened this issue Jun 10, 2024 · 2 comments
Labels
site-enhancement Feature request for some website triage Untriaged issue

Comments

@fren-archive
Copy link

fren-archive commented Jun 10, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

No response

Example URLs

https://youtu.be/jfKfPfyJRdk

Provide a description that is worded well enough to be understood

I want to use yt-dlp to extract web dash fragment urls for YouTube livestreams. Right now yt-dlp only extracts fragment URLs from the dash manifest. The web player, on the other hand, extracts them from adaptiveFormats in the html (and fixes nsig). I can get the fragment URLs from the dash manifest with yt-dlp --extractor-args 'youtube:player_client=web;formats=incomplete,duplicate' --print fragment_base_url -f 248-dash jfKfPfyJRdk resulting in:

https://rr3---sn-4o.googlevideo.com/videoplayback/expire/1717329791/ei/HwtcZuHMO6qLkucPhMOgyAQ/ip/73.212.163.75/id/jfKfPfyJRdk.2/itag/248/source/yt_live_broadcast/requiressl/yes/xpc/EgVo2aDSNQ%3D%3D/spc/UWF9fyE83WjFMibaoKWv_Jiwsw_G_Dt7xaQZg8zWIHSx1am0NgqeT9bd-nEm/vprv/1/playlist_type/DVR/ratebypass/yes/mime/video%2Fwebm/live/1/gir/yes/noclen/1/dur/5.000/rqh/1/keepalive/yes/sparams/expire,ei,ip,id,itag,source,requiressl,xpc,spc,vprv,playlist_type,ratebypass,mime,live,gir,noclen,dur,rqh/sig/AJfQdSswRQIhAPn5UhHoA5R-wlo3HjAxdUdImcD7e6DmzlIp5uRYlgUQAiBMOSz3tcD1vSZq5eEwAv0DEFmzeZ1gL1R__cgqyMzLdQ%3D%3D/initcwndbps/1875000/mh/rr/mm/44/mn/sn-4o/ms/lva/mt/1717307776/mv/m/mvi/3/pl/15/lsparams/initcwndbps,mh,mm,mn,ms,mv,mvi,pl/lsig/AHlkHjAwRQIhAINV2QJmCBRY7gX6Qd7zs_RUu_q9gMzKDGlSVGk6_ZccAiB8PxaBWEMF3g8GlL6iyIsDOkfTaW0l6LNebnWYpEcfJg%3D%3D/

However the fragment URLs used by the web player are different and cannot be converted:

https://rr4---sn-nx57ynsl.googlevideo.com/videoplayback?expire=1716898457&ei=OXZVZqvQEcTAsfIP4LW5oAk&ip=156.146.51.133&id=jfKfPfyJRdk.2&itag=248&aitags=242,243,244,247,248,278&source=yt_live_broadcast&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&hcs=ir,sd&mh=rr&mm=44,26&mn=sn-nx57ynsl,sn-a5meknds&ms=lva,onr&mv=m&mvi=4&pl=24&smhost=,rr3---sn-a5mekn6d.googlevideo.com&initcwndbps=998750&bui=AWRWj2Q1l1aXjLOowdDZq1uar_Suo6ENVmWnppLwt1A8hZBSoGDfiXHnDbI2f_yrbm_I3alif10vHJar&spc=UWF9f3jKiv5XfNEQMVPKzz2tBIdAvsl0CLg9xCX6Pxfd67fpFyQgHhGLWU9q&vprv=1&live=1&hang=1&noclen=1&svpuc=1&mime=video%2Fwebm&ns=FIiOz8Fab-jivO1lXcvGLPYQ&rqh=1&gir=yes&mt=1716876509&fvip=3&keepalive=yes&c=WEB&sefc=1&n=xcbcLceweULNjhjLl&sparams=expire,ei,ip,id,aitags,source,requiressl,xpc,bui,spc,vprv,live,hang,noclen,svpuc,mime,ns,rqh,gir&sig=AJfQdSswRQIhAO1V0td1NFiMFCdxJLpheT4iIxFmroRsgLxYBq-jXnH-AiBZ9A31wfJjbwkq2ueih25pbpoicAHYz4SdBCEEKHj2Ng%3D%3D&lsparams=hcs,mh,mm,mn,ms,mv,mvi,pl,smhost,initcwndbps&lsig=AHWaYeowRAIgZ-Mm8_dEHfg63wnXcye4x83DoctYebY7huXPGMIiK2YCIH2i163BA81u8NR5YuwdxKrJL8GWThM1Ce98hpNxsT0l

To be clear, these always point to the same data (as far as I know), but the web URLs are preferable in some circumstances. For example, with this URL one can simply replace itag with anything from aitags to change the format. I mainly just want these extracted to pass to other applications. I don't care if yt-dlp makes them downloadable natively (which is certainly possible but likely not worth the effort).

yt-dlp could do this already but there is a check on lines 3826-3827 of youtube.py which explicitly prevents it.

if fmt.get('targetDurationSec'):
 continue

itag = str_or_none(fmt.get('itag')) 
...

If I remove the check then yt-dlp jfKfPfyJRdk -f 248 --extractor-args 'youtube:player_client=web;formats=incomplete,duplicate' --print url gives the desired URL. Of course, if one tries to download this format it will fail since yt-dlp treats it as https.

Would it be possible to add an argument to bypass this check? Perhaps something like formats=broken so people don't accidentally use it. If this is undesirable, saving the web URLs to a separate metadata field (something like fragment_base_web_url) in the regular dash formats would be just as good but would not be quite as straightforward to implement.

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

> yt-dlp -vU --extractor-args 'youtube:player_client=web;formats=incomplete,duplicate' --print url -f 248 jfKfPfyJRdk
[debug] Command-line config: ['-vU', '--extractor-args', 'youtube:player_client=web;formats=incomplete,duplicate', '--print', 'url', '-f', '248', 'jfKfPfyJRdk']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp-nightly-builds [db50f19d7] (win_exe)
[debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-10-10.0.19045-SP0 (OpenSSL 1.1.1k  25 Mar 2021)
[debug] exe versions: ffmpeg git-2020-07-29-cbb6ba2, ffprobe git-2020-07-29-cbb6ba2, phantomjs 2.1.1
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.02.02, curl_cffi-0.5.10, mutagen-1.47.0, requests-2.32.3, sqlite3-3.35.5, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets, curl_cffi
[debug] Loaded 1820 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp-nightly-builds/releases/latest
Latest version: [email protected] from yt-dlp/yt-dlp-nightly-builds
yt-dlp is up to date ([email protected] from yt-dlp/yt-dlp-nightly-builds)
[youtube] Extracting URL: jfKfPfyJRdk
[youtube] jfKfPfyJRdk: Downloading webpage
[youtube] jfKfPfyJRdk: Downloading m3u8 information
[youtube] jfKfPfyJRdk: Downloading MPD manifest
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
ERROR: [youtube] jfKfPfyJRdk: Requested format is not available. Use --list-formats for a list of available formats
Traceback (most recent call last):
  File "yt_dlp\YoutubeDL.py", line 1606, in wrapper
  File "yt_dlp\YoutubeDL.py", line 1762, in __extract_info
  File "yt_dlp\YoutubeDL.py", line 1821, in process_ie_result
  File "yt_dlp\YoutubeDL.py", line 2955, in process_video_result
yt_dlp.utils.ExtractorError: [youtube] jfKfPfyJRdk: Requested format is not available. Use --list-formats for a list of available formats
@fren-archive fren-archive added site-enhancement Feature request for some website triage Untriaged issue labels Jun 10, 2024
@bashonly
Copy link
Member

Do you want the DASH formats that are extracted when you pass --live-from-start?

@fren-archive
Copy link
Author

No. From what I can tell the DASH URLs that are extracted with --live-from-start are generated from the manifest, not from adaptiveFormats. The extracted URLs do not have the aitags parameter and use slashes rather than & and = to separate parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-enhancement Feature request for some website triage Untriaged issue
Projects
None yet
Development

No branches or pull requests

2 participants