Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNNArticleIE Extractor: Update to recognize modern video links #10185

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kylegustavo
Copy link
Contributor

Example URLs:

Landing pages:
https://www.cnn.com/videos
https://www.cnn.com/videos/
https://edition.cnn.com/videos

Specific Videos:
https://www.cnn.com/2024/05/31/sport/video/jadon-sancho-borussia-dortmund-champions-league-exclusive-spt-intl https://edition.cnn.com/2024/06/11/politics/video/inmates-vote-jail-nevada-murray-dnt-ac360-digvid https://www.cnn.com/2024/06/11/style/video/king-charles-portrait-vandalized-activists-foster-intl-digvid

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

CNNArticleIE currently matches with URLs but fails to extract video. Update CNNArticleIE to work with the way most CNN video links are embedded, and update the tests to include some of these links. The type of URL for this can just use the default extractor. Also updating the regex to capture a date or text before the /video/ subcategory, as many video links are structured this way. Removing old test that has an old article without available media anymore.

Also, removing the CNNBlogsIE Extractor as the CNN Blogs website is now defunct and none of the old links are active anymore.

Fixes #9719

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

yt_dlp/extractor/cnn.py Fixed Show fixed Hide fixed
@kylegustavo kylegustavo marked this pull request as draft June 14, 2024 20:34
@kylegustavo kylegustavo force-pushed the kg.CNNArticleIEUpdate branch 2 times, most recently from 17fe959 to 3cb869f Compare June 14, 2024 23:35
@seproDev seproDev added the site-bug Issue with a specific website label Jun 14, 2024
CNNArticleIE currently matches with URLs but fails to extract video.
Update CNNArticleIE to work with the way most CNN video
links are embedded, and update the tests to include some of these
links. The type of URL for this can just use the default extractor.
Also updating the regex to capture a date or text before
the /video/ subcategory, as many video links are structured
this way. Removing old test that has an old article without
available media anymore.

Example URLs:

Landing pages:
https://www.cnn.com/videos
https://www.cnn.com/videos/
https://edition.cnn.com/videos

Specific Videos:
https://www.cnn.com/2024/05/31/sport/video/jadon-sancho-borussia-dortmund-champions-league-exclusive-spt-intl
https://edition.cnn.com/2024/06/11/politics/video/inmates-vote-jail-nevada-murray-dnt-ac360-digvid
https://www.cnn.com/2024/06/11/style/video/king-charles-portrait-vandalized-activists-foster-intl-digvid
@kylegustavo kylegustavo marked this pull request as ready for review June 17, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CNN Extractor CNNArticleIE is not working as expected
2 participants