Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TV5MONDE] Unable to download webpage: HTTP Error 403: Forbidden (caused by <HTTPError 403: Forbidden>) #10153

Open
10 of 11 tasks
Ottaviocr opened this issue Jun 10, 2024 · 9 comments
Labels
geo-blocked Content is geo-blocked site-bug Issue with a specific website

Comments

@Ottaviocr
Copy link

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

United Kingdom

Provide a description that is worded well enough to be understood

Downloadind videos from TV5Moinde fail with " HTTP Error 403: Forbidden " . See verbose output. Adding "--cookies-from-browser firefox" doesn't help. I can play the video fine in the browser and it does not require login.

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', 'http://www.tv5monde.com/tv/video/72384-le-journal-de-la-rtbf-edition-du-10-06-24-13h']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp-master-builds [db50f19d7] (zip)
[debug] Python 3.11.2 (CPython x86_64 64bit) - Linux-6.1.0-18-amd64-x86_64-with-glibc2.36 (OpenSSL 3.0.11 19 Sep 2023, glibc 2.36)
[debug] exe versions: ffmpeg 5.1.4-0 (setts), ffprobe 5.1.4-0
[debug] Optional libraries: brotli-1.0.9, certifi-2022.09.24, requests-2.28.1, sqlite3-3.40.1, urllib3-1.26.12
[debug] Proxy map: {}
[debug] Request Handlers: urllib
[debug] Loaded 1820 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp-master-builds/releases/latest
Latest version: [email protected] from yt-dlp/yt-dlp-master-builds
yt-dlp is up to date ([email protected] from yt-dlp/yt-dlp-master-builds)
[TV5MONDE] Extracting URL: http://www.tv5monde.com/tv/video/72384-le-journal-de-la-rtbf-edition-du-10-06-24-13h
[TV5MONDE] 72384-le-journal-de-la-rtbf-edition-du-10-06-24-13h: Downloading webpage
ERROR: [TV5MONDE] 72384-le-journal-de-la-rtbf-edition-du-10-06-24-13h: Unable to download webpage: HTTP Error 403: Forbidden (caused by <HTTPError 403: Forbidden>)
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/extractor/common.py", line 734, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/extractor/tv5mondeplus.py", line 99, in _real_extract
    webpage = self._download_webpage(url, display_id)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/extractor/common.py", line 1182, in _download_webpage
    return self.__download_webpage(url_or_request, video_id, note, errnote, None, fatal, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/extractor/common.py", line 1133, in download_content
    res = getattr(self, download_handle.__name__)(url_or_request, video_id, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/extractor/common.py", line 954, in _download_webpage_handle
    urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/extractor/common.py", line 903, in _request_webpage
    raise ExtractorError(errmsg, cause=err)

  File "/home/oc/opt/bin/yt-dlp/yt_dlp/networking/_urllib.py", line 396, in _send
    res = opener.open(urllib_req, timeout=self._calculate_timeout(request))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 557, in error
    result = self._call_chain(*args)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 749, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/extractor/common.py", line 890, in _request_webpage
    return self._downloader.urlopen(self._create_request(url_or_request, data, headers, query, extensions))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 4142, in urlopen
    return self._request_director.send(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/networking/common.py", line 117, in send
    response = handler.send(request)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/networking/_helper.py", line 208, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/networking/common.py", line 337, in send
    return self._send(request)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/oc/opt/bin/yt-dlp/yt_dlp/networking/_urllib.py", line 401, in _send
    raise HTTPError(UrllibResponseAdapter(e.fp), redirect_loop='redirect error' in str(e)) from e
yt_dlp.networking.exceptions.HTTPError: HTTP Error 403: Forbidden
@Ottaviocr Ottaviocr added site-bug Issue with a specific website triage Untriaged issue labels Jun 10, 2024
@bashonly
Copy link
Member

Looks like this may be caused by akamai TLS fingerprinting / anti-bot protection.

Try installing curl_cffi and adding --impersonate chrome to your command

@bashonly bashonly added the geo-blocked Content is geo-blocked label Jun 10, 2024
@LateLament
Copy link

Looks like this may be caused by akamai TLS fingerprinting / anti-bot protection.

Try installing curl_cffi and adding --impersonate chrome to your command

Works for me. Thx

@Ottaviocr
Copy link
Author

Can I compile yt-dlp from source with curl_cffi built in? If so, how?

@bashonly
Copy link
Member

bashonly commented Jun 18, 2024

Can I compile yt-dlp from source with curl_cffi built in? If so, how?

Yes, but you can't include curl_cffi in the unix zipfile build (which you were using in your original log), since it imports python dependencies from your environment.

You can include it in a pyinstaller build, though. If you're on linux, you can run this from the project root directory:

git restore yt_dlp/version.py
git checkout master && git pull  # if you want to build from a different branch/tag/commit, skip this
make clean
python3 -m venv .venv
source .venv/bin/activate
python3 devscripts/install_deps.py -o -i build
python3 devscripts/install_deps.py -i pyinstaller -i secretstorage -i curl-cffi
python3 devscripts/update-version.py
python3 devscripts/make_lazy_extractors.py
python3 -m bundle.pyinstaller

@Ottaviocr
Copy link
Author

Ottaviocr commented Jun 18, 2024 via email

@bashonly
Copy link
Member

bashonly commented Jun 18, 2024

relative path of the binary should be dist/yt-dlp_linux

@Ottaviocr
Copy link
Author

Ottaviocr commented Jun 19, 2024 via email

@bashonly
Copy link
Member

bashonly commented Jun 19, 2024

  1. Do I have to follow the above procedure (which I just copied and pasted as I have no idea what it does) every time I want a new version of yt-dlp?

If you want a binary with curl_cffi included in it, yes. As an alternative, you could install yt-dlp via pipx or with pip (in a virtual environment), which would make updating less painful.

  1. Could the Linux binary theoretically be shipped with curl-cffi by default?

Theoretically, yes. But practically, it's not possible right now. Our yt-dlp_linux binary needs to maintain compatibility with old Linux systems (glibc 2.17), and our current solution to this is shipping a statically linked musllibc binary. curl_cffi doesn't offer musllinux wheels for the version we are currently pinning (0.5.10), but they do for the latest 0.7.x series of releases. The problem is that there has not yet been a stable 0.7.x release of curl_cffi; it has been beta/pre-release only so far.

  1. If so, can I raise a feature request?

Not necessary. We hope to include curl_cffi in the linux binaries at some point in the future, but either we'll need curl_cffi to publish a stable 0.7.0 release (for the musllinux wheels) or we'll need to revamp our whole Linux build flow. (It's likely we'll end up doing the latter anyways.) The one certainty is that it's going to take some time.

@Ottaviocr
Copy link
Author

Ottaviocr commented Jun 20, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
geo-blocked Content is geo-blocked site-bug Issue with a specific website
Projects
Status: works with curl-impersonate
Development

No branches or pull requests

3 participants