Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't replace underscores with hyphens in distribution name #921

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

navrkald
Copy link

@navrkald navrkald commented Aug 16, 2022

Fixes #920

@bhrutledge bhrutledge changed the title Fixes #920. Don't replace unserscores with hyphens in distribution name. Don't replace unserscores with hyphens in distribution name. Aug 16, 2022
@@ -51,12 +51,12 @@
def _safe_name(name: str) -> str:
"""Convert an arbitrary string to a standard distribution name.

Any runs of non-alphanumeric/. characters are replaced with a single '-'.
Any runs of non-alphanumeric/./_ characters are replaced with a single '-'.

Copied from pkg_resources.safe_name for compatibility with warehouse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More that this comes from elsewhere and is intended to match behavior with PyPI (the server we care most about). This ostensibly breaks that compatibility

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. Pls check https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#name

Comparison of project names is case insensitive and treats arbitrarily-long runs of underscores, hyphens, and/or periods as equal. For example, if you register a project named cool-stuff, users will be able to download it or declare a dependency on it using any of the following spellings:

Cool-Stuff
cool.stuff
COOL_STUFF
CoOl__-.-__sTuFF

So underscores and hyphens shouldn't matter in PyPI and it should be backward compatible.
As well this PR is just correct implementation of PEP 508.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there has been some recent work on this, and possibly some changes. I want to dig into that before reviewing. That said, the linked guide is outdated; https://setuptools.pypa.io/en/stable/ is the authoritative documentation for setuptools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some initial research, I don't know what's the right thing to do, and it seems like there's ongoing discussion about package/distribution name normalization:

(cc @pfmoore @dstufft @ewdurbin as contributors to the above discussions)

As well this PR is just correct implementation of PEP 508.

It seems like _safe_name is attempting to implement normalization defined in PEP 503.

For additional context, the issue that led to this PR states:

This behaviour is causing problems e.g. if you have defined in Artifactory mandatory distribution name prefix containing underscores. E.g. org_name_prefix. So if you have package named org_name_prefix_my_package it is converted by twine to distribution name org-name-prefix-my-package and then Artifactory refuses to publish such package.

I think the important question here is: if Twine doesn't convert underscores to hyphens, will that be compatible with PyPI?

Additionally, it makes me a little nervous that Twine has its own normalization function. Has this logic been centralized anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the important question here is: if Twine doesn't convert underscores to hyphens, will that be compatible with PyPI?

I can't answer that, I'm afraid. But I do think that PyPI and twine should agree on either allowing unmodified distribution names (as defined in the project metadata) or using the same normalisation (and IMO PEP 503 is the main contender for a "universal" normalisation rule these days). On the other hand, PyPI should have access to the un-normalised name, so it can respect the project author's wishes in the UI and similar places. I don't have context for this issue, but if the field we're discussing here is the only place that the project name is passed to PyPI, I'd say that means that it should be passed unmodified.

My impression is that PyPI has much more difficult compatibility considerations, though, so for practical reasons it might be necessary to "follow what PyPI allows" for the short term. I'd view that as a temporary measure, though.

Has this logic been centralized anywhere?

Packaging has packaging.utils.canonicalize_name which (as far as I know) implements PEP 503 normalisation. But the PEP 503 rule is designed to be easy to copy and paste, so I'd consider replicating it to avoid a dependency on packaging as perfectly acceptable.

Copy link
Contributor

@bhrutledge bhrutledge Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I agree that Twine shouldn't normalize filenames (which it doesn't currently do).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see, I missed it switched back to effectively using safe_name().

So, the problem is still roughly the same, just being caused by the fact that safe_name normalizes _ to -.

You should not have to do any normalization of the name for PyPI.

Where PyPI wants things to be normalized it normalizes it itself, where it doesn't, it doesn't.

I think there was a time when that wasn't the case, and you had to do some normalization, unfortunately #70 and the linked issue #47 don't provide enough information for me to remember specifics.

If it were me, I'd be inclined to try removing all normalization from the name field and see if PyPI complains. I'm pretty sure it won't 1, but if it does then that would be a PyPI bug IMO.

Footnotes

  1. Assuming that the unnormalized name of the project matches the filename.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that twine should upload the file with the name as given (i.e., not normalise) I'm a bit concerned about the idea that neither twine nor the index server is validating that the files being uploaded are correct - where "correct" implies "normalised". But I guess twine can take a "trust the user" attitude, and servers might be prepared to accept unnormalised names. I'm not sure how the rest of the ecosystem would handle unnormalised wheel names (sdist names are still a bit of a mess so "legacy" forgives a lot there). I'm going to say that's not my problem though 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difference that's splitting hairs but twine takes a trust the build-system approach followed by a trust the repository server to error with a semi-helpful error response we can provide to the user.

Genuinely, I'm not certain twine needs to exist. It provides very little benefit at this point. Most of what it does for upload can now be achieved with the standard library (historically not the case) on most (if not all of the supported versions). Most of the checks it has are simplistic and could be done by a build system to verify its output during a finalisation stage (where it would provide more value and immediate feedback to the user closer to where the issue is).

There was a dream of twine becoming a build-then-upload tool and way of hiding every build backend but neither Brian nor I have made that happen. Speaking for myself, I certainly don't have the time or desire for that. With the number of build backends, it would likely be a nightmare to paper over all their APIs (assuming they're supporting an API for us to use) and we'd have to work with whatever existing config file they're using, section, etc. Just doesn't seem worth the effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a standard API for it, so you wouldn't need to paper over anything, just use the standard APIs... that being said, there's already a project doing that (https://github.com/pypa/build) now so adding another one seems silly.

@bhrutledge bhrutledge self-requested a review August 18, 2022 12:19
@bhrutledge bhrutledge changed the title Don't replace unserscores with hyphens in distribution name. Don't replace unserscores with hyphens in distribution name Aug 24, 2022
@bhrutledge bhrutledge changed the title Don't replace unserscores with hyphens in distribution name Don't replace underscores with hyphens in distribution name Aug 24, 2022
@Kodiologist
Copy link

As an outsider, I think transforming (via _safe_name) the name the user requested, without confirmation, is in general a bad idea. You can't really undo a PyPI upload. It would better to just stop with an error if the name won't work.

@s2t2
Copy link

s2t2 commented Mar 27, 2024

I used an underscore in my package directory name (e.g. "my_package"), as well as in the "setup.py" file (name: "my_package"), and in order to install via pip, we need to pip install my_package.

BUT the PyPI site displays the installation command improperly (pip install my-package) due to the undesired underscore to hyphen conversion.

See: https://pypi.org/project/gspread-models/

So we should definitely not do the conversion. Right?

@pradyunsg
Copy link
Member

FWIW, PyPI is now parsing names and canonicalising them internally.

https://github.com/pypi/warehouse/blob/3accdda633119eb1672563f62aee61892d89637e/warehouse/forklift/legacy.py#L1002

https://github.com/pypi/warehouse/blob/3accdda633119eb1672563f62aee61892d89637e/warehouse/forklift/legacy.py#L1019

This means that we can safely revert #745 now and switch to using canonical names when interacting with the PyPI. It might be worth considering what the effects of doing so could be on non-PyPI indexes (eg: devpi).

@s2t2
Copy link

s2t2 commented Jun 4, 2024

What does this latest reply mean, in simpler terms? Having a hard time understanding the message that is trying to be conveyed.

@sigmavirus24
Copy link
Member

The canonicalized name as defined in the relevant PEPs and implemented in the packaging library must match the file name if I remember correctly. The canonicalized name is based off of the name provided in the packaging metadata.

For example, example_project.py would be canonicalized to example-project-py. PyPI is now both performing the translation and enforcing it such that if an upload is sent which violates this a 400 Bad Request is returned indicating that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants