-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
virt_mshv_vtl: Proxy irr filtering #609
base: main
Are you sure you want to change the base?
Conversation
@@ -691,6 +693,17 @@ impl<'p, T: Backing> Processor for UhProcessor<'p, T> { | |||
} | |||
|
|||
for vtl in [GuestVtl::Vtl1, GuestVtl::Vtl0] { | |||
#[cfg(guest_arch = "x86_64")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of guest_arch cfgs throughout this PR, how many are actually needed for the code to compile on all platforms? We generally prefer to have as few of these as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is only for x86 and the invocation related to filtering happen to be in common path, hence had to add guest_arch_cfgs
. Added those after compiling for both aarch64
and x64-cvm
. Will take a look if I can further reduce it.
@@ -37,6 +37,8 @@ use super::UhVpInner; | |||
use crate::GuestVsmState; | |||
use crate::GuestVtl; | |||
use crate::WakeReason; | |||
#[cfg(guest_arch = "x86_64")] | |||
use bitvec::prelude::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: don't use glob imports.
Also if this is going to stay a guest_arch cfg'd import it can move into the cfg_if block above.
openhcl/virt_mshv_vtl/src/lib.rs
Outdated
/// New instance for requested VP count | ||
fn new(vp_count: u32) -> Self { | ||
DeviceIrrFilter { | ||
device_irr_filter: BitVec::repeat(false, 256).into_boxed_bitslice(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to always be a constant size should it be a BitArray instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is only for x86 platform and yes, on that IRR bitmap will always be constant of 256 bits. I will change this to BitArray
.
openhcl/virt_mshv_vtl/src/lib.rs
Outdated
} | ||
|
||
/// Mark the completion for `proxy_irr_filter` update for VP | ||
fn clr_vp_proxy_irr_filter_update(&self, vp_index: u32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: don't shorten words, just make it 'clear'
@@ -223,6 +275,9 @@ struct UhPartitionInner { | |||
no_sidecar_hotplug: AtomicBool, | |||
use_mmio_hypercalls: bool, | |||
backing_shared: BackingShared, | |||
#[inspect(skip)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like something we'd want wired up to inspect
openhcl/virt_mshv_vtl/src/lib.rs
Outdated
/// For requester VP to issue `proxy_irr_filter` update to other VPs | ||
#[cfg(guest_arch = "x86_64")] | ||
fn request_proxy_irr_filter_update(&self, vtl: GuestVtl, device_vector: u8, req_vp_index: u32) { | ||
tracing::info!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How frequently is this expected to happen? I feel like the tracing level should be debug or trace. Info is turned on by default, will this be too noisy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applies throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. Though it's not going to be frequent (one time SINT MSR writes and Device retarget), but debug
is the right level here. Will change.
openhcl/virt_mshv_vtl/src/lib.rs
Outdated
// excluding the requester VP (requester itself take care of updating its filter) | ||
device_irr[vtl].set_vps_proxy_irr_filter_update(req_vp_index); | ||
|
||
// Wake all the VPs, once the VP wakeup, it will query if `proxy_irr_filter` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should drop the lock before waking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Will scope if only for write device vector and vps bitmap.
openhcl/virt_mshv_vtl/src/lib.rs
Outdated
@@ -1609,6 +1721,8 @@ impl<'a> UhProtoPartition<'a> { | |||
no_sidecar_hotplug: params.no_sidecar_hotplug.into(), | |||
use_mmio_hypercalls: params.use_mmio_hypercalls, | |||
backing_shared: BackingShared::new(isolation, BackingSharedParams { cvm_state })?, | |||
#[cfg(guest_arch = "x86_64")] | |||
device_irr_filter: RwLock::new(VtlArray::from_fn(|_| DeviceIrrFilter::new(vps_count))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we're only proxying IRRs for VTL 0, so this probably doesn't need to be a VtlArray, it can just be a single.
openhcl/hcl/src/ioctl.rs
Outdated
@@ -1854,10 +1855,25 @@ impl<'a, T: Backing> ProcessorRunner<'a, T> { | |||
*r = irr.swap(0, Ordering::Relaxed); | |||
} | |||
} | |||
|
|||
// `proxy_irr`received from host is untrusted, only allow vectors that L2 expects | |||
for (f, v) in self.run.as_ref().proxy_irr_filter.iter().zip(r.iter_mut()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this anymore? I thought the kernel does the filtering now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kernel filtering the vectors would definitely be the ideal thing to do and with that notion only I added proxy_irr_filter
field in hcl_run
page, so that kernel can later use if (if/when needed). My understanding is that even when kernel opt to do irr filtering it has to somehow know what the valid vectors are that a HW-isolated CVM partition is expecting, and that information is only available in user-mode i.e. OpenVMM app (SINT and retarget intercepts).
Is there a branch that I can refer too where such work is in-progress ? If that work is already in flight, then I agree we don't need this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, to be clear, I just meant this specific code path, which performs the filtering--some recent kernel changes added filtering in the kernel but did not update user mode to configure it. See microsoft/OHCL-Linux-Kernel@10f345a.
So the rest of the code, to configure the filtering, is still required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the user mode part of this isn't merged yet. You probably want to rebase on #124, which I'll try to push to get merged today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Thanks for quick response and tagging the corresponding kernel change.
I will just put a note above this change, something like : N.B This code will be duplicate and should be removed once the kernel IRR bitmap filtering are merged
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jstarks, I looked at the changes for #124 and can see a new field proxy_irr_blocked
in hcl_run
page. If my understanding is correct, then this field is of reverse polarity than what I defined i.e. bits for the vectors that needs to be blocked should be set here (and assuming that's how kernel must be using it before writing to proxy_irr
) ?
If this is correct, then as part of this PR, once I rebase it to #124, then I should be initializing proxy_irr_blocked
to all 1
and then unset for trusted vectors. Right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. (We used reverse polarity to avoid a breaking change.)
#[cfg(guest_arch = "x86_64")] | ||
if self.partition.isolation.is_hardware_isolated() { | ||
// Complete any proxy filter update if required | ||
self.partition.complete_vp_proxy_filter_update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This takes a lock (a read lock, but still) for every VP entry. I think we need to avoid this. Can you instead add a wake reason for updating the filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that's a great suggestion! With this new
wake reason, which is for a VP, I can entirely get rid of the proxy_irr_filter_update_vps
bitmap (which was needed for request tracking) and combining that with @smalis-msft suggestion (as device retarget is only for VTL0) I think I can fully get rid of DeviceIrrFilter
and have only device_irr_filter
array in UhPartitionInner
and move all helper methods to UhPartitionInner
.
Please do share your thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
…R filtering and other review feedback implementation
let run = MappedPage::new(fd, vp as i64).map_err(|e| Error::MmapVp(e, None))?; | ||
let run: MappedPage<hcl_run> = | ||
MappedPage::new(fd, vp as i64).map_err(|e| Error::MmapVp(e, None))?; | ||
// SAFETY: Initializing `proxy_irr_blocked` to block all initially |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not how SAFETY comments work. SAFETY comments are intended to explain why the unsafe operation you're performing does not violate any of rust's safety rules. If you're interested I'd suggest reading chapter 1 of the nomicon (https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html). But for this case it's probably enough to look up other places where we touch the run page and mimic what they say.
@@ -1857,6 +1863,20 @@ impl<'a, T: Backing> ProcessorRunner<'a, T> { | |||
} | |||
} | |||
|
|||
/// Update the `proxy_irr_blocked` in run page | |||
pub fn update_proxy_irr_filter(&mut self, irr_filter: &BitArray<[u32; 8], Lsb0>) { | |||
// SAFETY: `proxy_irr_blocked` is accessed in both user and kernel, but from current VP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest saying something like "SAFETY: proxy_irr_blocked
is not accessed by any other VPs, so we know we have exclusive access"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is true. The kernel will access it in an interrupt context--only from the current VP yes, but we still need to use at least relaxed atomic reads/writes to access the values to satisfy the Rust and Linux kernel memory models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh that's right, thanks for catching this... OpenVMM has U/K model, so yes I agree, same VP running on user context can be interrupted and then kernel may access this field. Will move to atomics, like the way it has been done while accessing proxy_irr
.
|
||
for irr_bit in irr_filter.iter_ones() { | ||
tracing::debug!(irr_bit, "update_proxy_irr_filter"); | ||
proxy_irr_blocked[irr_bit >> 5] &= !(1 << (irr_bit & 0x1F)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a comment explaining this math? Why are we shifting things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler can be relied upon to strength reduce a constant power-of-two division/remainder to a shift/and. So I'd probably write this as proxy_irr_blocked[irr_bit / 32] &= !(1 << (irr_bit % 32))
to make this clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, it seems like you can just directly store irr_filter.into_inner().map(|v| !v)
into proxy_irr_blocked
rather than do this iteration. That seems a lot simpler (maybe just a little more complicated once you switch to using atomics).
I'd also suggest just taking a &[u32; 8]
as a parameter rather than adding BitArray
to the public interface for this crate.
|
||
#[cfg(guest_arch = "x86_64")] | ||
if wake_reasons.update_proxy_irr_filter() | ||
&& self.partition.isolation.is_hardware_isolated() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this wake reason ever get set on non-isolated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we are only issuing this wake request from impl<T: CpuIo, B: HardwareIsolatedBacking> UhHypercallHandler<'_, '_, T, B>::retarget_physical_interrupt
, so this will never be set on non-isolated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check that we're isolated in this if then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree its duplicate; we can get rid of it. Was just making it more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can put an assertion if you want.
// If updated is Synic MSR, then check if its proxy or previous was proxy | ||
// in either case, we need to update the `proxy_irr_blocked` | ||
let mut irr_filter_update = false; | ||
if matches!(msr, hvdef::HV_X64_MSR_SINT0..=hvdef::HV_X64_MSR_SINT15) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the order of these ifs, can we then merge the two checking hv?
Implementation for issues: #554 #563
The filter is kept updated for all VPs (i.e. during
SINT
updates andHvCallRetargetDeviceInterrupt
hypercall) and applied during the IRR bitmap collection i.e.ProcessorRunner::proxy_irr()
is extended to applyproxy_irr_filter
before returning final IRR bitmap.