Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orf:on - keep or download reports as separate videos #10142

Open
8 of 10 tasks
arminfuerst opened this issue Jun 9, 2024 · 15 comments
Open
8 of 10 tasks

orf:on - keep or download reports as separate videos #10142

arminfuerst opened this issue Jun 9, 2024 · 15 comments
Labels
patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-enhancement Feature request for some website

Comments

@arminfuerst
Copy link

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

Austria

Example URLs

https://on.orf.at/video/14229684

Provide a description that is worded well enough to be understood

Some videos at ORF-ON are provided with separate reports ("Fernsehbeitrag" in German), a typical example is the news "Zeit im Bild". I'd like to have (optional) a separate video for each report as it helps post processing in some cases. One typical example are reports ending with advertisements and if the reports are separate video files, it's relatively easy to remove the advertisements and rejoin the remaining parts to one video file.
Currently all reports are stored as one video file, this is working properly and it would be good if this possibility is kept as there are also videos where no post processing is needed.

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

yt-dlp -vU https://on.orf.at/video/14229684
[debug] Command-line config: ['-vU', 'https://on.orf.at/video/14229684']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version [email protected] from yt-dlp/yt-dlp-nightly-builds [db50f19d7] (zip)
[debug] Python 3.11.2 (CPython x86_64 64bit) - Linux-6.1.0-21-amd64-x86_64-with-glibc2.36 (OpenSSL 3.0.11 19 Sep 2023, glibc 2.36)
[debug] exe versions: ffmpeg 6.0.1 (fdk,setts), ffprobe 6.0.1, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.11.0, brotli-1.0.9, certifi-2022.09.24, mutagen-1.46.0, requests-2.28.1, secretstorage-3.3.3, sqlite3-3.40.1, urllib3-1.26.12, websockets-10.4
[debug] Proxy map: {}
[debug] Request Handlers: urllib
[debug] Loaded 1820 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp-nightly-builds/releases/latest
Latest version: [email protected] from yt-dlp/yt-dlp-nightly-builds
yt-dlp is up to date ([email protected] from yt-dlp/yt-dlp-nightly-builds)
[orf:on] Extracting URL: https://on.orf.at/video/14229684
[orf:on] 14229684: Downloading JSON metadata
[orf:on] 14229684: Downloading m3u8 information
[orf:on] 14229684: Downloading m3u8 information
[orf:on] 14229684: Downloading MPD manifest
[orf:on] 14229684: Downloading MPD manifest
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] 14229684: Downloading 1 format(s): dash-p0va0br3003654-1+dash-p0aa0br192000-1
[debug] Invoking dashsegments downloader on "https://vod-ww.mdn.ors.at/cms-worldwide_episodes_nas/_definst_/nas/cms-worldwide_episodes/online/14229684_0014_hr.smil/manifest.mpd"
[dashsegments] Total fragments: 115
[download] Destination: ZIB 1 vom 08.06.2024 [14229684].fdash-p0va0br3003654-1.mp4
[download] 100% of  407.92MiB in 00:00:54 at 7.53MiB/s
[debug] Invoking dashsegments downloader on "https://vod-ww.mdn.ors.at/cms-worldwide_episodes_nas/_definst_/nas/cms-worldwide_episodes/online/14229684_0014_hr.smil/manifest.mpd"
[dashsegments] Total fragments: 115
[download] Destination: ZIB 1 vom 08.06.2024 [14229684].fdash-p0aa0br192000-1.m4a
[download] 100% of   26.47MiB in 00:00:12 at 2.15MiB/s
[Merger] Merging formats into "ZIB 1 vom 08.06.2024 [14229684].mp4"
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:ZIB 1 vom 08.06.2024 [14229684].fdash-p0va0br3003654-1.mp4' -i 'file:ZIB 1 vom 08.06.2024 [14229684].fdash-p0aa0br192000-1.m4a' -c copy -map 0:v:0 -map 1:a:0 -movflags +faststart 'file:ZIB 1 vom 08.06.2024 [14229684].temp.mp4'
Deleting original file ZIB 1 vom 08.06.2024 [14229684].fdash-p0aa0br192000-1.m4a (pass -k to keep)
Deleting original file ZIB 1 vom 08.06.2024 [14229684].fdash-p0va0br3003654-1.mp4 (pass -k to keep)
@arminfuerst arminfuerst added site-enhancement Feature request for some website triage Untriaged issue labels Jun 9, 2024
@dirkf
Copy link
Contributor

dirkf commented Jun 9, 2024

With --keep-video, you should be able to decide whether to keep the segment videos or the concatenated version. Or --concat-playlist never should just leave the segment videos.

@seproDev
Copy link
Collaborator

seproDev commented Jun 9, 2024

If you pass the URL of an individual segment (Beitrag) you can also use --no-playlist to only download that segment.

yt-dlp --no-playlist "https://on.orf.at/video/14229684/15654778/cupal-orf-zur-geiselbefreiung"
[orf:on] Extracting URL: https://on.orf.at/video/14229684/15654778/cupal-orf-zur-geiselbefreiung
[orf:on] 14229684: Downloading JSON metadata
[orf:on] Downloading just the segment 15654778 because of --no-playlist
[orf:on] 15654778: Downloading m3u8 information
[orf:on] 15654778: Downloading m3u8 information
[orf:on] 15654778: Downloading MPD manifest
[orf:on] 15654778: Downloading MPD manifest
[info] 15654778: Downloading 1 format(s): dash-p0va0br3003433-1+dash-p0aa0br192000-1
[dashsegments] Total fragments: 14
[download] Destination: Cupal (ORF) zur Geiselbefreiung [15654778].fdash-p0va0br3003433-1.mp4
[download] 100% of   44.59MiB in 00:00:00 at 49.03MiB/s
[dashsegments] Total fragments: 14
[download] Destination: Cupal (ORF) zur Geiselbefreiung [15654778].fdash-p0aa0br192000-1.m4a
[download] 100% of    2.90MiB in 00:00:00 at 11.00MiB/s
[Merger] Merging formats into "Cupal (ORF) zur Geiselbefreiung [15654778].mkv"
Deleting original file Cupal (ORF) zur Geiselbefreiung [15654778].fdash-p0va0br3003433-1.mp4 (pass -k to keep)
Deleting original file Cupal (ORF) zur Geiselbefreiung [15654778].fdash-p0aa0br192000-1.m4a (pass -k to keep)

I don't think the suggestions by dirkf would work for this video, as no concatenation happens due to a non-segmented version existing. If there is really a demand for this, someone could add an extractor arg.

@dirkf
Copy link
Contributor

dirkf commented Jun 9, 2024

On closer examination, what seproDev said.

Ofc there are tools outside the scope of this project for detecting and removing ad breaks if you are left with the single-file version.

@arminfuerst
Copy link
Author

Thank you, but both answers don't address my suggestion. For the given example, I'd expect 13 video files. In other examples, I'd shorten some of these video files (e.g. with AVIDemux) and concat them to the whole video. So I'd not expect yt-dlp to detect and remove ad breaks. Having the segments in separate files would just make life much easier.
yt-dlp --no-playlist "https://on.orf.at/video/14229684/15654778/cupal-orf-zur-geiselbefreiung" is a part of the solution, I'd just like to provide the main URL and get all of these segments.

@dirkf
Copy link
Contributor

dirkf commented Jun 9, 2024

So you're asking for the extractor arg to override selecting a single video instead of the multi_video segment playlist, when both are available?

@arminfuerst
Copy link
Author

arminfuerst commented Jun 9, 2024

I'm not sure I fully understand your question. I am even not sure, this option already exists and I didn't find it yet.
Let me try to explain in other words.
orf:on provides two types of videos:

  1. whole/single videos
  2. segmented videos

No matter what type of video I am downloading, yt-dlp downloads them as a single video - which is already very fine!
I'd like an option that if I provide a URL to a segmented video, I have the possibility to get a single video for each segment. For me, it would also be fine, if it always leaves all segments and the full video, but I think this might confuse other users.
I hope this clarifies my suggestion?

@seproDev
Copy link
Collaborator

seproDev commented Jun 9, 2024

What you want is currently not possible with yt-dlp. The current logic always uses the non-segmented video if it is available. There exist videos where this version does not exist and with --concat-playlist never you could keep the original segments.

I assume something like this is what you want:

diff --git a/yt_dlp/extractor/orf.py b/yt_dlp/extractor/orf.py
index f1403d920..21adb9faa 100644
--- a/yt_dlp/extractor/orf.py
+++ b/yt_dlp/extractor/orf.py
@@ -550,7 +550,8 @@ def _real_extract(self, url):
             return self._extract_video_info(segment_id, selected_segment)

         # Even some segmented videos have an unsegmented version available in API response root
-        if not traverse_obj(api_json, ('sources', ..., ..., 'src', {url_or_none})):
+        if (self._configuration_arg('prefer_segmented_version', [False])[0]
+                or not traverse_obj(api_json, ('sources', ..., ..., 'src', {url_or_none}))):
             return self.playlist_result(
                 (self._extract_video_info(str(segment['id']), segment) for segment in segments),
                 video_id, **self._parse_metadata(api_json), multi_video=True)

Then you could run:

yt-dlp --extractor-args "ORFON:prefer_segmented_version=true" --concat-playlist never "https://on.orf.at/video/14229684"

to download all individual segments without concatenation.

@seproDev seproDev added patch-available There is patch available that should fix this issue. Someone needs to make a PR with it and removed triage Untriaged issue labels Jun 9, 2024
@arminfuerst
Copy link
Author

Thanks - I'm not sure how I could try this patch. As far as I can understand this, this might work. If this is included in the next (or a future) nightly build, I could try as soon as this is available.

@dirkf
Copy link
Contributor

dirkf commented Jun 9, 2024

Might it not be reasonable for the extractor to assume that, if --concat-playlist never is passed, it should prefer the segment playlist when available, thus avoiding the need for an extractor arg?

@arminfuerst
Copy link
Author

Probably a new parameter for --concat-playlist would also make sense? Having the same parameter with a different behavior depending on the site might be confusing.

@seproDev
Copy link
Collaborator

seproDev commented Jun 10, 2024

I don't think --concat-playlist never should be assigned a double meaning for this extractor only. On the site, these videos are represented as a single continues video. Imo. a user with --concat-playlist never in their config would not expect individual segments to suddenly be downloaded.
This is exactly the reason why extractor args were introduced. I don't see a reason for why they shouldn't be used here.

@arminfuerst
Copy link
Author

Should this patch be part of [email protected]?

@seproDev
Copy link
Collaborator

No. The issue will be closed once this has been addressed.

@arminfuerst
Copy link
Author

When I started a download today with the version 2024.06.22.232706 and the URL https://on.orf.at/video/14231734/formel-1-grosser-preis-von-spanien without the extractor argument but with "-k", the files were kept separately. I realized a relevant detail: The names of the files should be prefixed with a number so the correct order of these files is obvious. Would it be possible to implement this?

@bashonly
Copy link
Member

The names of the files should be prefixed with a number so the correct order of these files is obvious. Would it be possible to implement this?

This is the responsibility of the user; you can use an output template that includes %(playlist_index)s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-enhancement Feature request for some website
Projects
None yet
Development

No branches or pull requests

4 participants