Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

info dict is not picklable #10140

Open
10 tasks done
szc126 opened this issue Jun 9, 2024 · 1 comment
Open
10 tasks done

info dict is not picklable #10140

szc126 opened this issue Jun 9, 2024 · 1 comment
Labels
bug Bug that is not site-specific triage Untriaged issue

Comments

@szc126
Copy link

szc126 commented Jun 9, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Provide a description that is worded well enough to be understood

Info dicts cannot be pickled since 2023.9.24(?) (see the terminal output below):

diff --git a/yt_dlp/utils/networking.py b/yt_dlp/utils/networking.py
index ba0493cc2..7e369cbb4 100644
--- a/yt_dlp/utils/networking.py
+++ b/yt_dlp/utils/networking.py
@@ -57,6 +57,11 @@ class HTTPHeaderDict(collections.UserDict, dict):
     The constructor can take multiple dicts, in which keys in the latter are prioritised.
     """
 
+    def __new__(cls, *args, **kwargs):
+        obj = super().__new__(cls, *args, **kwargs)
+        obj.data = {}
+        return obj
+
     def __init__(self, *args, **kwargs):
         super().__init__()
         for dct in args:

i did have a patch for the http headers which are what breaks picklability across the board currently.
you could create an issue, but there are other extractor-specific culprits that result in info dicts that are unpicklable.
and we do have plans on implementing a "lazy info dict" which would almost certainly not be picklable in any way

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

/tmp % cat a.py
#!/usr/bin/env python3

import yt_dlp
from diskcache import Cache

with yt_dlp.YoutubeDL({'simulate': True, 'verbose': True}) as ytdl:
    for url in ['https://www.youtube.com/watch?v=aqz-KE-bpKQ']:
        cache = Cache('/tmp/cache-foo')
        info = cache.memoize()(ytdl.extract_info)(url)
        print(info['title'], info['id'])
/tmp % venv2024/bin/python a.py
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp-nightly-builds [db50f19d7] (pip) API
[debug] params: {'simulate': True, 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.24 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.12.3 (CPython x86_64 64bit) - Linux-6.8.9-zen1-2-zen-x86_64-with-glibc2.39 (OpenSSL 3.3.0 9 Apr 2024, glibc 2.39)
[debug] exe versions: ffmpeg 6.1.1 (setts), ffprobe 6.1.1
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.06.02, mutagen-1.47.0, requests-2.32.3, sqlite3-3.45.3, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1820 extractors
[youtube] Extracting URL: https://www.youtube.com/watch?v=aqz-KE-bpKQ
[youtube] aqz-KE-bpKQ: Downloading webpage
[youtube] aqz-KE-bpKQ: Downloading ios player API JSON
[debug] Loading youtube-nsig.dee49cfa from cache
[debug] [youtube] Decrypted nsig 1YjASRg2Zdx68ysbJ => Lh9ZvjeSmFeVwA
[debug] Loading youtube-nsig.dee49cfa from cache
[debug] [youtube] Decrypted nsig CtoR8uSVLIr_VY4Mb => splbT6_MhpnmFw
[youtube] aqz-KE-bpKQ: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] aqz-KE-bpKQ: Downloading 1 format(s): 315+258
Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film aqz-KE-bpKQ
/tmp % venv2024/bin/python a.py
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp-nightly-builds [db50f19d7] (pip) API
[debug] params: {'simulate': True, 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.74 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.12.3 (CPython x86_64 64bit) - Linux-6.8.9-zen1-2-zen-x86_64-with-glibc2.39 (OpenSSL 3.3.0 9 Apr 2024, glibc 2.39)
[debug] exe versions: ffmpeg 6.1.1 (setts), ffprobe 6.1.1
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.06.02, mutagen-1.47.0, requests-2.32.3, sqlite3-3.45.3, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1820 extractors
Traceback (most recent call last):
  File "/tmp/a.py", line 9, in <module>
    info = cache.memoize()(ytdl.extract_info)(url)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv2024/lib/python3.12/site-packages/diskcache/core.py", line 1872, in wrapper
    result = self.get(key, default=ENOVAL, retry=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv2024/lib/python3.12/site-packages/diskcache/core.py", line 1173, in get
    value = self._disk.fetch(mode, filename, db_value, read)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv2024/lib/python3.12/site-packages/diskcache/core.py", line 282, in fetch
    return pickle.load(reader)
           ^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv2024/lib/python3.12/site-packages/yt_dlp/utils/networking.py", line 70, in __setitem__
    super().__setitem__(key.title(), str(value).strip())
  File "/usr/lib/python3.12/collections/__init__.py", line 1138, in __setitem__
    self.data[key] = item
    ^^^^^^^^^
AttributeError: 'HTTPHeaderDict' object has no attribute 'data'
@szc126 szc126 added bug Bug that is not site-specific triage Untriaged issue labels Jun 9, 2024
@Inc44
Copy link

Inc44 commented Jun 17, 2024

I have tested the suggested modifications to the HTTPHeaderDict class, and they resolved the issue.

Before applying the fix, running python -OO test.py resulted in the following error:

python -OO test.py
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8 
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp [12b248ce6] (pip) API 
[debug] params: {'simulate': True, 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.24 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}} 
[debug] Python 3.10.14 (CPython AMD64 64bit) - Windows-10-10.0.22631-SP0 (OpenSSL 3.0.13 30 Jan 2024) 
[debug] exe versions: ffmpeg 7.0.1-full_build-www.gyan.dev (setts), ffprobe 7.0.1-full_build-www.gyan.dev 
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.06.02, mutagen-1.47.0, requests-2.32.3, sqlite3-3.45.3, urllib3-2.2.1, websockets-12.0 
[debug] Proxy map: {} 
[debug] Request Handlers: urllib, requests, websockets 
[debug] Loaded 1820 extractors 
Traceback (most recent call last):
  File "C:\Users\pc\Desktop\test\test.py", line 7, in <module>
    info = cache.memoize()(ytdl.extract_info)(url)
  File "C:\ProgramData\miniconda3\envs\test\lib\site-packages\diskcache\core.py", line 1872, in wrapper
    result = self.get(key, default=ENOVAL, retry=True)
  File "C:\ProgramData\miniconda3\envs\test\lib\site-packages\diskcache\core.py", line 1173, in get
    value = self._disk.fetch(mode, filename, db_value, read)
  File "C:\ProgramData\miniconda3\envs\test\lib\site-packages\diskcache\core.py", line 282, in fetch
    return pickle.load(reader)
  File "C:\ProgramData\miniconda3\envs\test\lib\site-packages\yt_dlp\utils\networking.py", line 70, in __setitem__
    super().__setitem__(key.title(), str(value).strip())
  File "C:\ProgramData\miniconda3\envs\test\lib\collections\__init__.py", line 1109, in __setitem__
    self.data[key] = item
AttributeError: 'HTTPHeaderDict' object has no attribute 'data'

After applying the suggested fix:

class HTTPHeaderDict(collections.UserDict, dict):
    """
    Store and access keys case-insensitively.
    The constructor can take multiple dicts, in which keys in the latter are prioritised.
    """

    def __new__(cls, *args, **kwargs):
        obj = super().__new__(cls, *args, **kwargs)
        obj.data = {}
        return obj

    def __init__(self, *args, **kwargs):
        super().__init__()
        for dct in args:
            if dct is not None:
                self.update(dct)
        self.update(kwargs)

The script ran successfully:

python -OO test.py   
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8 
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp [12b248ce6] (pip) API 
[debug] params: {'simulate': True, 'verbose': True, 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}} 
[debug] Python 3.10.14 (CPython AMD64 64bit) - Windows-10-10.0.22631-SP0 (OpenSSL 3.0.13 30 Jan 2024) 
[debug] exe versions: ffmpeg 7.0.1-full_build-www.gyan.dev (setts), ffprobe 7.0.1-full_build-www.gyan.dev 
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.06.02, mutagen-1.47.0, requests-2.32.3, sqlite3-3.45.3, urllib3-2.2.1, websockets-12.0 
[debug] Proxy map: {} 
[debug] Request Handlers: urllib, requests, websockets 
[debug] Loaded 1820 extractors 
Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film aqz-KE-bpKQ

This confirms that the proposed changes resolve the AttributeError. It would be beneficial to commit these changes to prevent this issue in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that is not site-specific triage Untriaged issue
Projects
None yet
Development

No branches or pull requests

2 participants