-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No more sending on closed channels during checkscontroller shutdown #245
base: main
Are you sure you want to change the base?
Conversation
First metric announces registration state.
When teh sparrow check controller would shut down, it would iterate over the actual slice of the checks, shutdown each check and then procceed to delete said check from the slice. Since the shutting down procedure is instant, there was a race condition that would delete a wrong check from the slice and then the same shutting down loop would try and shutdown the same check again. Just returning a copy of the slice resolves this problem, as the iteration is now done on the copy only. A more sophisticated deletion routine for the checks slice could be another way to handle this, but it would probably increase the complexity of the checks and checkscontroller structs.
Signed-off-by: Bruno Bressi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -53,5 +54,5 @@ func (c *Checks) Delete(check checks.Check) { | |||
func (c *Checks) Iter() []checks.Check { | |||
c.mu.RLock() | |||
defer c.mu.RUnlock() | |||
return c.checks | |||
return slices.Clone(c.checks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK will this be a problem. When slices.Clone(c.checks)
is called, it creates a copy of the checks. Any updates applied to these cloned instances will not affect the original running instances. This means that the original checks will never receive the updated configurations, which can prevent them from functioning correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll write some tests to replicate the problem. And then we can talk about how to better fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news: the tests show that the change works as intended. Or at least I think they do :)
The shallow copy still provides a slice with the references intact. Each check instance will run its UpdateConfig
and the actual check in the checks
field of the controller will get updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense, but let me give it a try in a manual e2e test just to make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it was the next thing on my list. Let's sit down after you're done with your exams and talk about your e2e branch 🌿
This test proves that the shallow clone works as intended and returns a clone of the slice, where the original references can still be used and updated.
This should fix the bodyclose linting remarks Signed-off-by: Bruno Bressi <[email protected]>
Motivation
Fixes #244
Changes
The change is just one LoC: return a copy of the
checks
slice when callingIter()
, which TBH is a safer and simpler way to handle this slice (long live functional programming).The rest is just some light housekeeping:
Tests done
The manual test case mentioned in the issue was carried out; started the sparrow and sent a keyboard interrupt. The error doesn't occur anymore:
Additionally, I've added a new unit test case (which does fail in the main branch ATM). The test actually tests the shutdown function of the
ChecksController
, which we had unfortunately neglected.TODO