Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The byte string format goes over the specified byte value if it's in a middle of a multi-byte utf-8 codepoint #10060

Open
10 tasks done
nurupo opened this issue May 29, 2024 · 0 comments · May be fixed by #10068
Open
10 tasks done
Labels
bug Bug that is not site-specific core-triage triage requested from a core dev

Comments

@nurupo
Copy link

nurupo commented May 29, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Provide a description that is worded well enough to be understood

I want to limit the filename to be at most 68 bytes. The byte string format should supposedly help me with that, by specifying -o '%(title).68B'. Yet, the filename I get is 70 bytes.

$ yt-dlp --simulate -o '%(title).68B' --print filename https://www.youtube.com/watch?v=m9D15U6tBG8
踊 - Ado ⧸ covered by キズナアイ(ブラックアイ)【歌
$ echo -n '踊 - Ado ⧸ covered by キズナアイ(ブラックアイ)【歌' | wc -c
70
$ yt-dlp --simulate -o '%(title).68B' --print filename https://www.youtube.com/watch?v=m9D15U6tBG8 | tr -d '\n' | wc -c
70

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', '--no-config', '--simulate', '-o', '%(title).68B', '--print', 'filename', 'https://www.youtube.com/watch?v=m9D15U6tBG8']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp [12b248ce6] (pip)
[debug] Python 3.9.2 (CPython x86_64 64bit) - Linux-5.10.0-26-amd64-x86_64-with-glibc2.31 (OpenSSL 1.1.1w  11 Sep 2023, glibc 2.31)
[debug] exe versions: ffmpeg 4.3.6-0, ffprobe 4.3.6-0, phantomjs broken
[debug] Optional libraries: Cryptodome-3.14.1, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, requests-2.31.0, sqlite3-3.34.1, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1820 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: [email protected] from yt-dlp/yt-dlp
yt-dlp is up to date ([email protected] from yt-dlp/yt-dlp)
[youtube] Extracting URL: https://www.youtube.com/watch?v=m9D15U6tBG8
[youtube] m9D15U6tBG8: Downloading webpage
[youtube] m9D15U6tBG8: Downloading ios player API JSON
[debug] [youtube] Extracting signature function js_bc657243_100
[youtube] m9D15U6tBG8: Downloading player bc657243
[debug] Saving youtube-sigfuncs.js_bc657243_100 to cache
[debug] Loading youtube-nsig.bc657243 from cache
[debug] [youtube] Decrypted nsig hv3wzOlkm_fcEJi39 => e9aHlq7Rb5og-Q
[debug] [youtube] Extracting signature function js_bc657243_104
[debug] Loading youtube-sigfuncs.js_bc657243_104 from cache
[debug] Loading youtube-nsig.bc657243 from cache
[debug] [youtube] Decrypted nsig n-KNdAtGjF15rF735 => 1ZXeYgsNR1wlPQ
[youtube] m9D15U6tBG8: Downloading m3u8 information
[debug] [youtube] Invalid start time (1140.0 > 209) for chapter "日程: 2021年6月30日(水) 開場 18:00 / 開演"
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] m9D15U6tBG8: Downloading 1 format(s): 616+251
踊 - Ado ⧸ covered by キズナアイ(ブラックアイ)【歌
@nurupo nurupo added bug Bug that is not site-specific triage Untriaged issue labels May 29, 2024
nurupo added a commit to nurupo/yt-dlp that referenced this issue May 30, 2024
The byte string-format should be applied after the sanitization is done,
as sanitize might replace a single byte character with a multi-byte one,
e.g. '/' with '⧸', making the resulting string go over the desired byte
limit.

Fixes yt-dlp#10060
nurupo added a commit to nurupo/yt-dlp that referenced this issue May 30, 2024
The byte string-format should be applied after the sanitization is done,
as sanitize might replace a single byte character with a multi-byte one,
e.g. '/' with '⧸', making the resulting string go over the desired byte
limit.

Fixes yt-dlp#10060
nurupo added a commit to nurupo/yt-dlp that referenced this issue May 30, 2024
The byte string-format should be applied after the sanitization is done,
as sanitize might replace a single byte character with a multi-byte one,
e.g. '/' with '⧸', making the resulting string go over the desired byte
limit.

Fixes yt-dlp#10060
nurupo added a commit to nurupo/yt-dlp that referenced this issue May 30, 2024
The byte string-format should be applied after the sanitization is done,
as sanitize might replace a single byte character with a multi-byte one,
e.g. '/' with '⧸', making the resulting string go over the desired byte
limit.

Fixes yt-dlp#10060
nurupo added a commit to nurupo/yt-dlp that referenced this issue May 30, 2024
The byte string-format should be applied after the sanitization is done,
as sanitize might replace a single byte character with a multi-byte one,
e.g. '/' with '⧸', making the resulting string go over the desired byte
limit.

Fixes yt-dlp#10060
@bashonly bashonly added core-triage triage requested from a core dev and removed triage Untriaged issue labels May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that is not site-specific core-triage triage requested from a core dev
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants