-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-5027 + 5055: DRA: admin-controlled device attributes + device taints #5034
base: master
Are you sure you want to change the base?
Conversation
/cc @KobayashiD27 For the "device priority" use case. /cc @byako For device health. |
@pohly: GitHub didn't allow me to request PR reviews from the following users: KobayashiD27. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
keps/sig-node/5027-dra-admin-controlled-device-attributes/README.md
Outdated
Show resolved
Hide resolved
/wg device-management |
keps/sig-node/5027-dra-admin-controlled-device-attributes/README.md
Outdated
Show resolved
Hide resolved
keps/sig-node/5027-dra-admin-controlled-device-attributes/README.md
Outdated
Show resolved
Hide resolved
531a905
to
cddc84f
Compare
cddc84f
to
c4a6f66
Compare
These are two different KEPs that provide two features that can be enabled and disabled independently. However, both use the same new ResourceSliceOverride type and thus get described and implemented together.
c4a6f66
to
41cdbf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was earlier discussion of common (driver independent) tool(ing) for listing, adding and removing device taints. Would it make sense to mention something about that in the tainting KEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deeply appreciated for your quick action for device taints/tolerations KEP!! I left some comments. PTAL.
keps/sig-node/5027-dra-admin-controlled-device-attributes/README.md
Outdated
Show resolved
Hide resolved
identify the device (by name or with a CEL expression), manually create a | ||
ResourceSliceOverride with a unique name, then remember to remove that | ||
ResourceSliceOverride again. For beta, support in `kubectl` for common | ||
operations may be needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two paragraphs are also new and discuss the usability aspect. While spelling it out, I noticed that selecting a device by name was in fact not specified yet, so I added it to ResourceSliceOverride
- see class/driver/pool/device fields there.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
As discussed today during the WG Device Management meeting, these two KEPs have no impact on the kubelet and thus should better be owned by SIG Scheduling alone. |
keps/sig-scheduling/5027-dra-admin-controlled-device-attributes/README.md
Outdated
Show resolved
Hide resolved
// and/or CEL selectors. All of these criteria must be satisfied by a device, otherwise | ||
// it is ignored by the override. A DeviceOverride with no selection criteria is | ||
// valid and matches all devices. | ||
type DeviceOverride struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the API might be more human-readable by moving the set of filters/selectors into a separate parent struct, so that the actual override data (Attributes
/Capacity
) is distinctly organized, and more easily disambiguated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved them into a DeviceOverrideSelector
struct which is stored in a Selector
field. The actual YAML then would look like this:
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceSliceOverride
metadata:
...
spec:
devices:
selector:
driver: dra.example.com
pool: work-node
device: gpu-0
attributes:
my-additional-attribute-foo:
string: bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "selector" for the field becomes a bit more problematic when considering CEL selectors inside it:
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceSliceOverride
metadata:
...
spec:
devices:
selector:
deviceClass: dra.example.com
selectors:
- cel:
expression: device.attributes["dra.example.com"].uid == "ABCD-1234"
taints:
- key: dra.example.com/unhealthy
value: memory checksum error
effect: NoSchedule
Selectors inside a selector? Hmm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
spec:
devices:
filters:
deviceClass: dra.example.com
selectors:
- cel:
expression: device.attributes["dra.example.com"].uid == "ABCD-1234"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be "filters" or "filter"? It's not a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably "filter" is the best compromise. A single "filter" can be a composition of a set of selectors, right? Even if we're playing loose and fast w/ grammar, I agree that "filters" suggests a list/array/slice instead of a dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's "filter" and DevicePatchFilter
.
keps/sig-scheduling/5027-dra-admin-controlled-device-attributes/README.md
Outdated
Show resolved
Hide resolved
|
||
The intent to override device attributes must be recorded persistently so that | ||
it is preserved even when a ResourceSlice gets removed or updated. To achieve | ||
this, a new cluster-scoped ResourceSliceOverride type gets added. A single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this new API enables partial (or no) overrides across the possible sets of Attributes
+ Capacity
key-values, and that it also enables adding brand new key-values as an extension of the existing ResourceSlice data... does that suggest that ResourceSlicePatch
is a more semantically expressive name for this new API? The term "override" doesn't entirely capture the tolerant outcome that our proposed merging strategy will yield.
Not trying to bike shed too much! wdyt?
(Patch is also expressive of our canonical use case: cluster admins updating a device driver as a part of node maintenance.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion. I was struggling with "override" myself when considering cases where the actual merge strategy isn't a strict "one value value wins", for example for taints.
While making the change, I noticed one complication: the plural of "Patch" is "Patches". This non-standard plural form makes some of the API implementation icky. But good naming is worth that inconvenience, so let's go with it unless someone has a better suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for considering this.
I think this is more expressive, and also makes the KEP itself a bit more readable as all the concepts come together more easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Gentle reminder to propagate the final name throughout the taints/tolerations KEP once that's decided)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of patches...
As described right now, a ResourceSlicePatch
cannot remove attributes. I don't have a specific use case in mind for it, I'm just seeing the gap.
One way of supporting it would be add a Remove *bool
in DeviceAttribute
which can only be set in a ResourceSlicePatch
:
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceSliceOverride
metadata:
...
spec:
devices:
selector:
driver: dra.example.com
pool: work-node
device: gpu-0
attributes:
some-existing-attribute:
remove: true
Would this be useful?
Note that this can break user's CEL expressions: if a vendor defines "some-existing-attribute is always set for our devices", then users don't need to check for existence. An admin removing it then causes attribute lookup errors for those users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any immediate use case for this, but agree it seems worth enabling if it doesn't add too much overhead.
I suppose one other alternative would be something like defining an attribute with no value:
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceSlicePatch
metadata:
...
spec:
devices:
selector:
driver: dra.example.com
pool: work-node
device: gpu-0
attributes:
some-existing-attribute: {}
If an empty attribute can also be defined in a regular ResourceSlice like that and it's functionally equivalent to the attribute not being defined at all, then the semantics might be simpler than a distinct remove
toggle. On the other hand, it's leaning a little into "magic" territory where it's not obvious what an empty value means just by looking at the API.
One tweak to that to make it a little more explicit might be to have an explicit null value for an attribute.
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceSlicePatch
metadata:
...
spec:
devices:
selector:
driver: dra.example.com
pool: work-node
device: gpu-0
attributes:
some-existing-attribute:
null: {}
This also looks a little weird though and will likely require some nonsense like this:
if attr.NullValue != nil {
// attribute is null
}
This high-level approach would be more similar to something like a plain JSON merge patch, where having a separate remove
field feels more like a RFC 6902 JSON patch. If we add a remove
field, maybe we could instead add an op
field like in the RFC and make remove
one possible value for that alongside others like add
or replace
. That might make remove
less of a special case at the expense of making the more common add/replace case a little more verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the zero value approach is sustainable and equivalent to delete in terms of expressing "I want to override any actionable outcomes that the existing attribute's value may initiate".
- an attribute with a string
""
zero value is reliably equivalent to the attribute not existing (golang idiom would suggest that if a""
empty string value is significant it should be implemented as a*string
to disambiguate between and explicit""
and "no user-provided value" - any bool value can be equivalently "deleted" by setting to
false
(iffalse
is an explicit, non-default value, then it should be implemented as a*bool
) - struct values can be "deleted" via
{}
- numeric values for which
0
is equivalent should be implemented as a pointer - any pointer to a type can be equivalently "deleted" by setting the value to
nil
Maybe I'm overthinking the above and the set of attribute types if more strictly constrained?
tl;dr I think the zero value approach is more elegant if it is reliably deterministic in the ways I've described
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some-existing-attribute: {}
The problem is that a null DeviceAttribute
where all fields were left unset by the user cannot be distinguished from a future DeviceAttribute
where some new field was set which the client doesn't know about yet. All fields in DeviceAttribute
are part of a "one of": exactly one must be set for it to be valid. Receiving no fields from the apiserver tells clients that they are out-dated and cannot handle the DeviceAttribute
.
We use this in several places in the Kubernetes API to prevent clients from doing something that they shouldn't be doing because they don't know better. In this case, a client would remove an attribute instead of overriding it with some unknown value type. The explicit remove: true
avoids that. So does null: {}
. I like null: {}
a little better.
On the other hand, it's leaning a little into "magic" territory where it's not obvious what an empty value means just by looking at the API.
That's also true.
I don't see any immediate use case for this, but agree it seems worth enabling if it doesn't add too much overhead.
I doubt that it adds overhead. It's mostly just extra work for the design (see discussion above...) and review.
If a CEL expression fails for a device, the override does not apply and an | ||
event will be generated for the ResourceSlicePatch with the faulty CEL | ||
expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "fail" in this context mean an invalid CEL expression caused by something like a syntax error, and not that it cleanly evaluates to false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"fails to evaluate to a boolean (runtime error, wrong result type)".
Syntax errors are caught during validation, but the attribute lookup is not type safe (devices.attributes[...].someField
may or may not be a bool) and can cause key lookup exceptions (in this case, if someField
isn't matching some attribute).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the paragraph.
used for scheduling new pods. In addition, pods already running with access to | ||
a tainted device can be stopped automatically. Cluster administrators can do | ||
the same by creating a | ||
[ResourceSliceOverride](../5027-dra-admin-controlled-device-attributes] with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ResourceSliceOverride](../5027-dra-admin-controlled-device-attributes] with a | |
[ResourceSliceOverride](../5027-dra-admin-controlled-device-attributes) with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// The name of each attribute must be unique in that set and | ||
// include the domain prefix. | ||
// | ||
// The maximum number of attributes and capacities combined is 32. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify here if this limit applies to only an individual DevicePatch or among the ResourceSlice and all the DevicePatches for a particular device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only applies here. Will clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
One-line PR description: DRA: admin-controlled device attributes + device attributes
Issue links:
Other comments: first revision
/cc @johnbelamaric