Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type validation with pydantic #307

Open
sigma67 opened this issue Oct 5, 2022 · 8 comments
Open

Type validation with pydantic #307

sigma67 opened this issue Oct 5, 2022 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@sigma67
Copy link
Owner

sigma67 commented Oct 5, 2022

Something I've been meaning to implement in the long term is proper models and type validation for the responses from YouTube Music. This would provide better data validation and would notify sooner if something on the API side has changed, as opposed to the current tests which usually check for the length of the results in the response only.

pydantic seems to be the gold standard for this. I'd like to wait for the release of v2.0 to not have to deal with breaking changes from v1 once it comes out.

Please thumbs up if you're in favor and leave a comment if you're willing to help or contribute 🥳

@sigma67 sigma67 added enhancement New feature or request help wanted Extra attention is needed labels Oct 5, 2022
@sigma67
Copy link
Owner Author

sigma67 commented Jan 4, 2023

Pydantic v2 ETA is now Q1 2023, so we can start work on this Q2 at the earliest

@sigma67 sigma67 added this to the 1.0 milestone Mar 24, 2023
@sigma67 sigma67 modified the milestones: 1.0, 2.0 Apr 8, 2023
@ambujpawar
Copy link

I am willing to help :)

@sigma67
Copy link
Owner Author

sigma67 commented Apr 20, 2023

Pydantic v2 is now in alpha: https://docs.pydantic.dev/blog/pydantic-v2-alpha/

I think we should use this issue as a starting point to create additional subtasks using GitHub tasklists: https://docs.github.com/en/issues/tracking-your-work-with-issues/about-tasklists

As this is a major endeavor with breaking changes development should take place on a v2 branch for now.

@JohnHKoh
Copy link
Contributor

Started playing around with this, here is a rudimentary way to provide type validation.

I created a new module called where I started to define some types. The Playlist type, for example, looks like this:

class Playlist(BaseModel):
    title: str
    playlistId: str
    thumbnails: list[Thumbnail]
    description: str
    count: Optional[str] = None
    author: Optional[list[Author]] = None

Then, inside "parsers/browsing.py", we can create a new instance of this Playlist class with the information we parse from parse_playlist:

def parse_playlist(data):
    playlist = {
        'title': nav(data, TITLE_TEXT),
        'playlistId': nav(data, TITLE + NAVIGATION_BROWSE_ID)[2:],
        'thumbnails': nav(data, THUMBNAIL_RENDERER)
    }
    subtitle = data['subtitle']
    if 'runs' in subtitle:
        playlist['description'] = "".join([run['text'] for run in subtitle['runs']])
        if len(subtitle['runs']) == 3 and re.search(r'\d+ ', nav(data, SUBTITLE2)):
            playlist['count'] = nav(data, SUBTITLE2).split(' ')[0]
            playlist['author'] = parse_song_artists_runs(subtitle['runs'][:1])

    return Playlist(**playlist) # Return playlist instance

(Note, this could probably be done at a "lower level" without having to create the intermediary playlist dictionary first)

Then, we can add some type hinting in "mixins/library.py"...

    def get_library_playlists(self, limit: int = 25) -> list[Playlist]:

And now we get some more helpful type hinting with the new object.
image

The data validation works too. For example, in the Playlist class, I specify count as optional, because I believe empty playlists do not have the count field. If I remove the optional type and make it required, I get a validation error for a playlist that does NOT have the count field:

  Field required [type=missing, input_value={'title': 'Episodes for L...des you save for later'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/0.38.0/v/missing

I can continue to help with this work - I think it can be very helpful for maintaining the API as well as improving usability with the type hints. Here's a quick branch comparison between the testing I did and the current master branch:

https://github.com/sigma67/ytmusicapi/compare/master...JohnHKoh:ytmusicapi:v2-pydantic-test?expand=1

@sigma67
Copy link
Owner Author

sigma67 commented Jun 10, 2023

Looks like a good start! I like that it's not very invasive to the current code base, at least not in this case.

I think we should probably start by dumping the currently returned responses to make sure we don't miss any fields. Then we can have a look at the code to determine which fields need to be optional, for each function.

Did you use pydantic V2 beta? I think you'd also need to add that to the dependencies.

I also noticed that you used list as opposed to List, we would need to increase requires_python>=3.9 for that to happen. Which is fine by me, as 3.8 will only have 1 year of support left by the time we ship this change.

@JohnHKoh
Copy link
Contributor

I think we should probably start by dumping the currently returned responses to make sure we don't miss any fields. Then we can have a look at the code to determine which fields need to be optional, for each function.

Definitely, anywhere we can consolidate these? Should we just post them in here? The greater the sample size the better, so we don't miss any edge cases or differences.

Did you use pydantic V2 beta? I think you'd also need to add that to the dependencies.

I was using v2 beta in a venv, but yes, it will need to be added. I was just playing around with integrating it into the code, so nothing I wrote was really meant to be final! Just a start. The way we "convert" the raw data into the new types is definitely an implementation detail that could warrant more discussion. Should it use some kind of constructor? Or some other generic parsing function?

I also noticed that you used list as opposed to List, we would need to increase requires_python>=3.9 for that to happen. Which is fine by me, as 3.8 will only have 1 year of support left by the time we ship this change.

I also support the push list, as it seems like the List from the typing module may be deprecated some time after October 2025.

@sigma67
Copy link
Owner Author

sigma67 commented Jun 25, 2023

Definitely, anywhere we can consolidate these? Should we just post them in here? The greater the sample size the better, so we don't miss any edge cases or differences.

I don't think we need to many different samples, just need to ensure that the edge cases are covered (like unavailable songs, which have some data fields, but not others). Instead of dumping we could also just go ahead and try with all fields optional and default None. That would already be an improvement over the current case where sometimes the keys are missing, making the library a bit cumbersome to use.

Should it use some kind of constructor? Or some other generic parsing function?

Definitely just use the default constructor

I also support the push list

Then let's go ahead with >=3.9.

@TheOnlyWayUp
Copy link

Hi, I recently made an API Wrapper that used Pydantic.

https://github.com/TheOnlyWayUp/Wattpad-Py

A tool I used prominently was https://jsontopydantic.com.

I'm going to draft up a PR this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants