Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed JSON subtitles as Matroska attachments #9991

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Riteo
Copy link
Contributor

@Riteo Riteo commented May 21, 2024

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

The --embed-subs option currently skips JSON files, such as Youtube's live chat data, as they have no standard structure and FFmpeg complains. This is a bit of a bummer as, to my best knowledge, they're the last hurdle left for single-file downloads, which I use for my archives.

This PR allows them to be embedded as regular attachments if using mkv/mka files, aided by the use of metadata stream specifiers1. These kind specifiers allow to be more precise and simplify the logic as we don't have to guess the stream number.

It also includes an improvement for the metadata post processor by using this stream selector type so that we don't accidentally override other JSON data (such as JSON subs) when embedding the info JSON.

Fixes #9957.

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

  • Core bug fix/improvement

Footnotes

  1. m:key[:value], see https://ffmpeg.org//ffmpeg.html#toc-Stream-specifiers-1 .

@Riteo
Copy link
Contributor Author

Riteo commented May 21, 2024

Tests run fine on my machine but the CI complains about a self signed certificate for the proxy tests, not sure what to do here :/

@Riteo
Copy link
Contributor Author

Riteo commented May 22, 2024

Rebased. Hopefully tests will pass now, as I'm seeing some changes to requests.

Update: They did!

@seproDev seproDev added the enhancement New feature or request label May 22, 2024
The old stream index specifiers would indiscriminately select any JSON
attachment, which made stuff like embedding live chat json data risky if
not impossible.

Also adds `-copy_unknown` as JSON data is "unknown" according to FFmpeg
(since it has no codec id) and thus would otherwise be rejected by
default.
Since we can't embed them as regular subtitles (due to them not having
any consistent structure), we embed them as file attachments, if
exporting as Matroska.

This allows us to have single-file downloads with everything embedded
for e.g. archival purposes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Embed JSON subtitles as file attachments
2 participants