-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add MachineDrainRule "WaitCompleted" #11545
base: main
Are you sure you want to change the base?
Conversation
bab1c1b
to
211839e
Compare
@@ -281,6 +281,7 @@ func (d *Helper) EvictPods(ctx context.Context, podDeleteList *PodDeleteList) Ev | |||
var podsToTriggerEvictionLater []PodDelete | |||
var podsWithDeletionTimestamp []PodDelete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the existing test coverage for the modified funcs and extend accordingly for the new case (we should have test coverage for everything, only new cases should be needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also extend the NodeDrainTimeout e2e test to cover waitcompleted (node_drain.go)
(Unfortunately the whole drain implementation is a bit brittle (already before this PR) so e2e coverage would be really good. For example with the current PR even Pods with the label set to waitcompleted would have been drained because the machineDrainRulesFilter would have overwritten the result of the drainLabelFilter with behavior drain)
@@ -498,6 +507,10 @@ func (r EvictionResult) ConditionMessage(nodeDrainStartTime *metav1.Time) string | |||
conditionMessage = fmt.Sprintf("%s\nAfter above Pods have been removed from the Node, the following Pods will be evicted: %s", | |||
conditionMessage, PodListToString(r.PodsToTriggerEvictionLater, 3)) | |||
} | |||
if len(r.PodsToWaitCompleted) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini Is this something that should be specifically handled on higher levels when the condition bubbles up? (like we handled PDB's etc.)
(could also be a follow-up PR)
func MakePodDeleteStatusWaitCompleted() PodDeleteStatus { | ||
return PodDeleteStatus{ | ||
DrainBehavior: clusterv1.MachineDrainRuleDrainBehaviorWaitCompleted, | ||
Reason: PodDeleteStatusTypeWaitCompleted, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional that Pods with WaitCompleted basically don't have a drain order?
This means that Pods with "WaitCompleted" would never block eviction of other Pods (which might be a problem if the Pods with "WaitCompleted" depend on some other Pods to keep running)
(let's update the godoc comment of the order field and the webhook implementation according to the outcome of this discussion)
@@ -281,6 +281,7 @@ func (d *Helper) EvictPods(ctx context.Context, podDeleteList *PodDeleteList) Ev | |||
var podsToTriggerEvictionLater []PodDelete | |||
var podsWithDeletionTimestamp []PodDelete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also extend the NodeDrainTimeout e2e test to cover waitcompleted (node_drain.go)
(Unfortunately the whole drain implementation is a bit brittle (already before this PR) so e2e coverage would be really good. For example with the current PR even Pods with the label set to waitcompleted would have been drained because the machineDrainRulesFilter would have overwritten the result of the drainLabelFilter with behavior drain)
211839e
to
533803e
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
533803e
to
8826d5b
Compare
Signed-off-by: Vince Prignano <[email protected]>
8826d5b
to
e22bc4d
Compare
/test ? |
@vincepri: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-cluster-api-e2e-main |
@vincepri: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
This PR adds the ability for drain to wait for the completion of specific pods. This is useful in scenario where drain is either handled outside the context of
kubectl drain
after a Node is cordoned, or for long running batch Jobs that should be allowed to terminate on their own.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #