CBC News Videos Extractor Not Working: "Unable to download XML: HTTP Error 404: Not Found" #10170

LifesGottaBeFun · 2024-06-13T03:55:31Z

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

I'm reporting that yt-dlp is broken on a supported site
I've verified that I have updated yt-dlp to nightly or master (update instructions)
I've checked that all provided URLs are playable in a browser with the same IP and same login details
I've checked that all URLs and arguments with special characters are properly quoted or escaped
I've searched known issues and the bugtracker for similar issues including closed ones. DO NOT post duplicates
I've read the guidelines for opening an issue
I've read about sharing account credentials and I'm willing to share it if required

Region

Non-Geoblocked

Provide a description that is worded well enough to be understood

I tried to download this video: https://www.cbc.ca/player/play/video/9.6420651

However, it failed and gave me the "Unable to download XML: HTTP Error 404: Not Found" error.

Provide verbose output that clearly demonstrates the problem

Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
If using API, add 'verbose': True to YoutubeDL params instead
Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['https://www.cbc.ca/player/play/video/9.6420651', '-o', 'D:\\Downloaded Audio-Video Tracks\\ViaYouTubeDL\\cbc.ca\\Custom\\%(title)s-%(id)s.%(ext)s', '-o', 'D:/EdmontonAirMonitoring.mp4', '-vU']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out cp1252 (No VT), error cp1252 (No VT), screen cp1252 (No VT)
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp [12b248ce6] (win_exe)
[debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-10-10.0.22621-SP0 (OpenSSL 1.1.1k  25 Mar 2021)
[debug] exe versions: ffmpeg 6.0-essentials_build-www.gyan.dev (setts), ffprobe 6.0-essentials_build-www.gyan.dev
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.02.02, curl_cffi-0.5.10, mutagen-1.47.0, requests-2.32.2, sqlite3-3.35.5, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets, curl_cffi
[debug] Loaded 1820 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: [email protected] from yt-dlp/yt-dlp
yt-dlp is up to date ([email protected] from yt-dlp/yt-dlp)
[cbc.ca:player] Extracting URL: https://www.cbc.ca/player/play/video/9.6420651
[cbc.ca:player] 9.6420651: Downloading webpage
[ThePlatform] Extracting URL: http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/None?mbr=true&formats=MPEG4,FLV,MP3#__youtubedl_smuggle=%7B%22force_smil_url%22%3A+true%7D
[ThePlatform] None: Downloading SMIL data
[ThePlatform] None: Unable to download XML: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)
  File "yt_dlp\extractor\common.py", line 734, in extract
  File "yt_dlp\extractor\theplatform.py", line 313, in _real_extract
  File "yt_dlp\extractor\theplatform.py", line 34, in _extract_theplatform_smil
  File "yt_dlp\extractor\common.py", line 1133, in download_content
  File "yt_dlp\extractor\common.py", line 1093, in download_handle
  File "yt_dlp\extractor\adobepass.py", line 1366, in _download_webpage_handle
  File "yt_dlp\extractor\common.py", line 954, in _download_webpage_handle
  File "yt_dlp\extractor\common.py", line 903, in _request_webpage
  File "yt_dlp\extractor\common.py", line 890, in _request_webpage
  File "yt_dlp\YoutubeDL.py", line 4142, in urlopen
  File "yt_dlp\networking\common.py", line 117, in send
  File "yt_dlp\networking\_helper.py", line 208, in wrapper
  File "yt_dlp\networking\common.py", line 337, in send
  File "yt_dlp\networking\_requests.py", line 366, in _send
yt_dlp.networking.exceptions.HTTPHTTP Error 404: Not Found
An error occured

The text was updated successfully, but these errors were encountered:

trainman261 · 2024-06-17T20:37:56Z

I've noticed the same problem. It seems like #9534 was a precursor to this. As far as I can tell, there is no MediaID key anymore, which was what was being used to get the video files from ThePlatform. Looking through how the site works now, I can't find any reference to ThePlatform anymore (although I am a bit of a noob at this, so feel free to tell me I'm wrong).
What definitely works (tried manually successfully) is:

Load the webpage and search for the JSON coming after <script id="initialStateDom">window.__INITIAL_STATE__ = as we do now already
In that extracted JSON, we need to look under video for all the info (including metadata) we need. Most importantly for the video, we need video/currentClip/media/assets/key - that links to another block of JSON
in that JSON we need to grab the url key, which links to the master m3u8 file containing all the info needed. I've tried feeding that URL to yt-dlp and it works, using the generic extractor. Note that the link has a fairly short expiry date (not more than a few minutes IIRC)

I've also found that the whole TS, MP4 as well as VTT files are directly accessible by analyzing the traffic and (for MP4s) messing around with the URLs pulled. In the meantime I've found that the direct link to the VTT file can be extracted from the first block of JSON, but I'm still looking to find a solid pattern as to the TS and mp4 files.

The first option is the most straight forward, but works via HLS and tends to download ~30 files per minute of video (~45 if you add subtitles), meaning ~2000 files for a 45 minute video with subtitles. The second option would be a nice addition, but somewhat more complex.

I'll try to convert the first option into code within the coming week - but if someone else gets around to it sooner feel free and go ahead.

trainman261 · 2024-06-20T20:42:47Z

Update: I've gotten around to implementing a rudimentary solution, I've pushed it to a branch on my dev fork. It works on my end and if someone needs a stopgap, feel free to use it until I polish it up and submit a PR.

LifesGottaBeFun added site-bug Issue with a specific website triage Untriaged issue labels Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CBC News Videos Extractor Not Working: "Unable to download XML: HTTP Error 404: Not Found" #10170

CBC News Videos Extractor Not Working: "Unable to download XML: HTTP Error 404: Not Found" #10170

LifesGottaBeFun commented Jun 13, 2024

trainman261 commented Jun 17, 2024 •

edited

Loading

trainman261 commented Jun 20, 2024

CBC News Videos Extractor Not Working: "Unable to download XML: HTTP Error 404: Not Found" #10170

CBC News Videos Extractor Not Working: "Unable to download XML: HTTP Error 404: Not Found" #10170

Comments

LifesGottaBeFun commented Jun 13, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Region

Provide a description that is worded well enough to be understood

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

trainman261 commented Jun 17, 2024 • edited Loading

trainman261 commented Jun 20, 2024

trainman261 commented Jun 17, 2024 •

edited

Loading