Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error messages when a pending Trusted Publisher's project name already exists #17405

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

facutuesca
Copy link
Contributor

@facutuesca facutuesca commented Jan 13, 2025

The problem

There are some cases where we validate the name of a new project, where the error messages could be better:

The implementation

This PR removes the enum used for describing the invalid project name errors, replacing it with simple classes that allow including some metadata in each error object. This metadata allows us to tell if the current user is the owner of the existing project, which we then use to display better error messages.

# Before
class ProjectNameUnavailableReason(enum.Enum):
    Invalid = "invalid"
    Stdlib = "stdlib"
    AlreadyExists = "already-exists"
    Prohibited = "prohibited"
    TooSimilar = "too-similar"

# After
class ProjectNameUnavailableInvalid:
    pass
class ProjectNameUnavailableStdlib:
    pass
class ProjectNameUnavailableProhibited:
    pass

class ProjectNameUnavailableExisting:
    def __init__(self, existing_project: Project):
        self.existing_project: Project = existing_project
class ProjectNameUnavailableSimilar:
    def __init__(self, similar_project: Project):
        self.similar_project: Project = similar_project

ProjectNameUnavailableError = (
    ProjectNameUnavailableInvalid
    | ProjectNameUnavailableStdlib
    | ProjectNameUnavailableExisting
    | ProjectNameUnavailableProhibited
    | ProjectNameUnavailableSimilar
)

Screenshots

Creating a pending Trusted Publisher

Project already exists and user is owner

Screenshot 2025-01-13 at 22 39 42

Project already exists and user is not owner

Screenshot 2025-01-13 at 22 42 15

This fixes #17392

cc @di @woodruffw

if (
existing_project := self.db.query(Project)
.where(Project.normalized_name == func.normalize_pep426_name(name))
.scalar()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what would happen in the event that two conflicting names exist? We didn't always enforce this and did not retro-actively "clean up" so it could happen?

I think in the pre-existing implementation papers over that because the "exists" clause will end up a boolean through the "scalar" clause, where as after I think you'll get a list of Project objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't scalar return the first result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall it doing so if one and only one exists, otherwise a multiple results exception.

Would you mind adding a test to validate the behavior if two preexisting conflicting names exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh yeah, I was looking at the wrong docs. Good catch! Fixed by replacing scalar() with first(), and added a test for it.

@samuelcolvin
Copy link

Thanks so much for fixing so quickly ❤️ .

Comment on lines 79 to 83
_(
"This project name is too similar to an existing project "
"named '${name}'",
mapping={"name": similar_project.name},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_(
"This project name is too similar to an existing project "
"named '${name}'",
mapping={"name": similar_project.name},
)
_("This project name is too similar to an existing project")

I know we get requests to do this often, but I don't generally think there's a ton of value in providing the conflicting project name here for two reasons:

  1. we're only providing one conflicting name, but there could be multiple;
  2. by providing a conflicting name, there's not a lot the user can do with it besides track down the owner, try to contact them directly, and get them to delete their project, which I think is less than ideal.

I'd rather have users submit support requests to acquire the project name and let us release names when appropriate so both projects can continue to exist.

Copy link
Contributor Author

@facutuesca facutuesca Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1, we could return a list of project names if that was the case.

For 2, I think that only makes sense if the conflicting project is abandoned, in which case yes, the user could try to contact the owner to get them to delete it.
But for active projects, it's useful for the user to know which project is conflicting: if it's an active or well-known package they probably won't want to take over it.

Knowing exactly which project is too similar in name allows the user to make the decision if they should go forward or not with trying to get that name, whereas otherwise they are forced to go through PyPI support without knowing which package they are trying to get deleted/released.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is when a user explicitly wants a different name than the conflicting name though, so the ideal situation would be that they could go forward with publishing under the original name, and not have to take over or delete a slightly different name in order to use the original name.

Adding multiple conflicting names to the error message would just exacerbate this I think: what would we expect the user to do with that information? The better outcome would be that regardless of what the conflicting names(s) are, they are able to request the original name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1, we could return a list of project names if that was the case.

I think this could work, although in principle the list of conflicting names is unbounded (or could become unbounded with a future restriction on similarity).

I think @di's point with contact is that maintainers will find it annoying/frustrating to have deletion requests directed at them first, especially since there's typically no (strong) public signal that a project is actually abandoned. There's also a possibility that less-experienced users think that the project itself is imposing the restriction, which will lead to reports from those users that the projects can't actually do anything about (e.g. "pip is preventing me from uploading p1p to PyPI").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the ideal situation would be that they could go forward with publishing under the original name, and not have to take over or delete

Ah, I wasn't aware that this could be done (allowing a new project that has a too-similar name as an existing one). I see you mentioned at the end of your original comment, I just didn't register it 🤦

I see your point. I still think knowing the name might be useful in some cases (for example, if I'm trying to create p1p, and I see that it conflicts with pip, I will just find a different name rather than ask the PyPI admins about it).

But I'm guessing this is rare, and not worth modifying the error message for.
WDYT about the rest of the PR? Should I remove the changes related to the too-similar error, and leave the rest as-is?. Or should we go with a different implementation, since we're only adding extra information to the "already exists" error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed the name from the error message

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say I've had a lot of luck asking people to delete dormant packages so I can use the name, pretty quickly.

But I've had very little luck with support requests, e.g.they take ages and don't always work. E.g. we've reported copyright legal violations and had security concerns and had no response after months.

Having the package name would be very useful in real world scenarios.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. we've reported copyright legal violations and had security concerns and had no response after months.

@samuelcolvin For security-concern requests going without response, can you elaborate where these were reported? I monitor the security inbox closely and investigate violations pretty quickly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @miketheman, sorry for the slow response - we had a company offsite which was all consuming.

The case I was thinking of was pypi/support#5187 where we had (I think) a legitimate legal and security concern, and it took months to get a response. My point with regard to this issue was just that historically "just ask admins to take X down" has not been a quick solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @samuelcolvin ! No worries, hope the offsite went well!
Thanks for sharing the case in question. I hear you on the requests - PyPI got a support specialist in July who's been cranking through account recovery requests, followed by PEP 541 requests, so there's definitely an uptick in the pace of resolution.
In this particular case, following the instructions in PEP 541 Intellectual Property policy is probably the best move.

This change improves the error messages when uploading a new project
or creating a pending Trusted Publisher, when the new project's name
already exists or is too similar to an existing project.

Signed-off-by: Facundo Tuesca <[email protected]>
Signed-off-by: Facundo Tuesca <[email protected]>
Fix the check for projects with too-similar names when
there is more than one existing project with the same
ultranormalized name.

Signed-off-by: Facundo Tuesca <[email protected]>
@facutuesca facutuesca force-pushed the ft/improve-upload-pkg-error-msgs branch from 1e208d0 to 882ebe3 Compare January 14, 2025 19:15
@facutuesca facutuesca changed the title Improve error messages when a new project's name is too similar to an existing one Improve error messages when a pending Trusted Publisher's project name already exists Jan 14, 2025
@@ -1130,3 +1130,34 @@ class AlternateRepository(db.Model):
name: Mapped[str]
url: Mapped[str]
description: Mapped[str]


class ProjectNameUnavailableInvalid:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: this doesn't feel particularly model-centric - why is this no longer appropriate as part of the interface?

Comment on lines +90 to +101
if is_project_owner:
# The project settings URL is only shown in the error message if
# the user is the owner of the project
assert route_url.calls == [
pretend.call(
"manage.project.settings.publishing",
project_name="some-project",
_query={"provider": {"activestate"}},
)
]
else:
assert route_url.calls == []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Instead of parametrizing the test for a boolean on owner, duplicate the test and remove conditional logic from the test case, and name the new test case appropriately. Reduces complexity for a little duplication. This note applies to all test cases.

self.similar_project: Project = similar_project


ProjectNameUnavailableError = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: The *Error name here made me think we're starting to treat these like exceptions, but they aren't.

Would using explicit exceptions, and adopting a try/except block in the caller be better here?

if self.db.query(
exists().where(
if (
similar_project := self.db.query(Project)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the project name is no longer returned, are these changes necessary for the assignment? The previous call of exists()... is less expensive than creating a Project object as a result, especially if we end up throwing it away.

Alternately, if we do only need the name, this could be rewritten as selecting only the name, something like so?

if (similar_project_name := self.db.scalars(
    select(Project.name)
    .where(func.ultranormalize_name(Project.name) == func.ultranormalize_name(name))
    .first()
):
...

as then similar_project_name would be a string, not an object, we we don't need the entire thing.

There's also something subtly wrong with selecting the first result and returning the name, since there may be other collisions (unlikely, but not impossible) - have you considered that in this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trusted Publisher error message is confusing when project exists but user doesn't have ownership
6 participants