-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error messages when a pending Trusted Publisher's project name already exists #17405
base: main
Are you sure you want to change the base?
Improve error messages when a pending Trusted Publisher's project name already exists #17405
Conversation
7fc5519
to
e935be1
Compare
warehouse/packaging/services.py
Outdated
if ( | ||
existing_project := self.db.query(Project) | ||
.where(Project.normalized_name == func.normalize_pep426_name(name)) | ||
.scalar() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what would happen in the event that two conflicting names exist? We didn't always enforce this and did not retro-actively "clean up" so it could happen?
I think in the pre-existing implementation papers over that because the "exists" clause will end up a boolean through the "scalar" clause, where as after I think you'll get a list of Project objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't scalar
return the first result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall it doing so if one and only one exists, otherwise a multiple results exception.
Would you mind adding a test to validate the behavior if two preexisting conflicting names exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh yeah, I was looking at the wrong docs. Good catch! Fixed by replacing scalar()
with first()
, and added a test for it.
Thanks so much for fixing so quickly ❤️ . |
warehouse/oidc/forms/_core.py
Outdated
_( | ||
"This project name is too similar to an existing project " | ||
"named '${name}'", | ||
mapping={"name": similar_project.name}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_( | |
"This project name is too similar to an existing project " | |
"named '${name}'", | |
mapping={"name": similar_project.name}, | |
) | |
_("This project name is too similar to an existing project") |
I know we get requests to do this often, but I don't generally think there's a ton of value in providing the conflicting project name here for two reasons:
- we're only providing one conflicting name, but there could be multiple;
- by providing a conflicting name, there's not a lot the user can do with it besides track down the owner, try to contact them directly, and get them to delete their project, which I think is less than ideal.
I'd rather have users submit support requests to acquire the project name and let us release names when appropriate so both projects can continue to exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1, we could return a list of project names if that was the case.
For 2, I think that only makes sense if the conflicting project is abandoned, in which case yes, the user could try to contact the owner to get them to delete it.
But for active projects, it's useful for the user to know which project is conflicting: if it's an active or well-known package they probably won't want to take over it.
Knowing exactly which project is too similar in name allows the user to make the decision if they should go forward or not with trying to get that name, whereas otherwise they are forced to go through PyPI support without knowing which package they are trying to get deleted/released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is when a user explicitly wants a different name than the conflicting name though, so the ideal situation would be that they could go forward with publishing under the original name, and not have to take over or delete a slightly different name in order to use the original name.
Adding multiple conflicting names to the error message would just exacerbate this I think: what would we expect the user to do with that information? The better outcome would be that regardless of what the conflicting names(s) are, they are able to request the original name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1, we could return a list of project names if that was the case.
I think this could work, although in principle the list of conflicting names is unbounded (or could become unbounded with a future restriction on similarity).
I think @di's point with contact is that maintainers will find it annoying/frustrating to have deletion requests directed at them first, especially since there's typically no (strong) public signal that a project is actually abandoned. There's also a possibility that less-experienced users think that the project itself is imposing the restriction, which will lead to reports from those users that the projects can't actually do anything about (e.g. "pip
is preventing me from uploading p1p
to PyPI").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the ideal situation would be that they could go forward with publishing under the original name, and not have to take over or delete
Ah, I wasn't aware that this could be done (allowing a new project that has a too-similar name as an existing one). I see you mentioned at the end of your original comment, I just didn't register it 🤦
I see your point. I still think knowing the name might be useful in some cases (for example, if I'm trying to create p1p
, and I see that it conflicts with pip
, I will just find a different name rather than ask the PyPI admins about it).
But I'm guessing this is rare, and not worth modifying the error message for.
WDYT about the rest of the PR? Should I remove the changes related to the too-similar error, and leave the rest as-is?. Or should we go with a different implementation, since we're only adding extra information to the "already exists" error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, removed the name from the error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say I've had a lot of luck asking people to delete dormant packages so I can use the name, pretty quickly.
But I've had very little luck with support requests, e.g.they take ages and don't always work. E.g. we've reported copyright legal violations and had security concerns and had no response after months.
Having the package name would be very useful in real world scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. we've reported copyright legal violations and had security concerns and had no response after months.
@samuelcolvin For security-concern requests going without response, can you elaborate where these were reported? I monitor the security inbox closely and investigate violations pretty quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @miketheman, sorry for the slow response - we had a company offsite which was all consuming.
The case I was thinking of was pypi/support#5187 where we had (I think) a legitimate legal and security concern, and it took months to get a response. My point with regard to this issue was just that historically "just ask admins to take X down" has not been a quick solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @samuelcolvin ! No worries, hope the offsite went well!
Thanks for sharing the case in question. I hear you on the requests - PyPI got a support specialist in July who's been cranking through account recovery requests, followed by PEP 541 requests, so there's definitely an uptick in the pace of resolution.
In this particular case, following the instructions in PEP 541 Intellectual Property policy is probably the best move.
This change improves the error messages when uploading a new project or creating a pending Trusted Publisher, when the new project's name already exists or is too similar to an existing project. Signed-off-by: Facundo Tuesca <[email protected]>
Signed-off-by: Facundo Tuesca <[email protected]>
Fix the check for projects with too-similar names when there is more than one existing project with the same ultranormalized name. Signed-off-by: Facundo Tuesca <[email protected]>
Signed-off-by: Facundo Tuesca <[email protected]>
1e208d0
to
882ebe3
Compare
@@ -1130,3 +1130,34 @@ class AlternateRepository(db.Model): | |||
name: Mapped[str] | |||
url: Mapped[str] | |||
description: Mapped[str] | |||
|
|||
|
|||
class ProjectNameUnavailableInvalid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: this doesn't feel particularly model-centric - why is this no longer appropriate as part of the interface?
if is_project_owner: | ||
# The project settings URL is only shown in the error message if | ||
# the user is the owner of the project | ||
assert route_url.calls == [ | ||
pretend.call( | ||
"manage.project.settings.publishing", | ||
project_name="some-project", | ||
_query={"provider": {"activestate"}}, | ||
) | ||
] | ||
else: | ||
assert route_url.calls == [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Instead of parametrizing the test for a boolean on owner, duplicate the test and remove conditional logic from the test case, and name the new test case appropriately. Reduces complexity for a little duplication. This note applies to all test cases.
self.similar_project: Project = similar_project | ||
|
||
|
||
ProjectNameUnavailableError = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: The *Error
name here made me think we're starting to treat these like exceptions, but they aren't.
Would using explicit exceptions, and adopting a try/except block in the caller be better here?
if self.db.query( | ||
exists().where( | ||
if ( | ||
similar_project := self.db.query(Project) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the project name is no longer returned, are these changes necessary for the assignment? The previous call of exists()...
is less expensive than creating a Project object as a result, especially if we end up throwing it away.
Alternately, if we do only need the name
, this could be rewritten as selecting only the name, something like so?
if (similar_project_name := self.db.scalars(
select(Project.name)
.where(func.ultranormalize_name(Project.name) == func.ultranormalize_name(name))
.first()
):
...
as then similar_project_name
would be a string, not an object, we we don't need the entire thing.
There's also something subtly wrong with selecting the first result and returning the name, since there may be other collisions (unlikely, but not impossible) - have you considered that in this change?
The problem
There are some cases where we validate the name of a new project, where the error messages could be better:
The implementation
This PR removes the
enum
used for describing the invalid project name errors, replacing it with simple classes that allow including some metadata in each error object. This metadata allows us to tell if the current user is the owner of the existing project, which we then use to display better error messages.Screenshots
Creating a pending Trusted Publisher
Project already exists and user is owner
Project already exists and user is not owner
This fixes #17392
cc @di @woodruffw