> On 6/16/23 20:07, Sean Christopherson wrote: > > On Fri, Jun 16, 2023, Dmytro Maluka wrote: > >> On 6/16/23 15:56, Sean Christopherson wrote: > >>> On Fri, Jun 16, 2023, Dmytro Maluka wrote: > >>>> Again, pedantic mode on, I find it difficult to agree with the wording > >>>> that the guest owns "most of" the HW resources it uses. It controls the > >>>> data communication with its hardware device, but other resources (e.g. > >>>> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the > >>>> host and virtualized by it for the guest. > >>> > >>> I wasn't saying that the guest owns most resources, I was saying that the > *untrusted* > >>> host does *not* own most resources that are exposed to the guest. My > understanding > >>> is that everything in your list is owned by the trusted hypervisor in the pKVM > model. > >> > >> Heh, no. Most of these resources are owned by the untrusted host, that's > >> the point. > > > > Ah, I was overloading "owned", probably wrongly. What I'm trying to call out is > > that in pKVM, while the untrusted host can withold resources, it can't subvert > > most of those resources. Taking scheduling as an example, a pKVM vCPU may > be > > migrated to a different pCPU by the untrusted host, but pKVM ensures that it is > > safe to run on the new pCPU, e.g. on Intel, pKVM (presumably) does any > necessary > > VMCLEAR, IBPB, INVEPT, etc. to ensure the vCPU doesn't consume stale data. > > Yep, agree. > > >> Basically for two reasons: 1. we want to keep the trusted hypervisor as > >> simple as possible. 2. we don't need availability guarantees. > >> > >> The trusted hypervisor owns only: 2nd-stage MMU, IOMMU, VMCS (or its > >> counterparts on non-Intel), physical PCI config space (merely for > >> controlling a few critical registers like BARs and MSI address > >> registers), perhaps a few more things that don't come to my mind now. > > > > The "physical PCI config space" is a key difference, and is very relevant to this > > doc (see my response to Allen). > > Yeah, thanks for the links and the context, BTW. > > But let me clarify that we have 2 things here that should not be > confused with each other. We have 2 levels of virtualization of the PCI > config space in pKVM. The hypervisor traps the host's accesses to the > config space, but mostly it simply passes them through to hardware. Most > importantly, when the host reprograms a BAR, the hypervisor makes sure > to update the corresponding MMIO mappings in the host's and the guest's > 2nd-level page tables (that is what makes protection of the protected > guest's passthrough PCI devices possible at all). But essentially it's > the host that manages the physical config space. And the host, in turn, > virtualizes it for the guest, using vfio-pci, like it is traditionally > done for passthrough PCI devices. > > This latter, emulated config space is the concern. Looking at the > patches [1] and thinking if those MSI-X misconfiguration attacks are > possible in pKVM, I come to the conclusion that yes, they are. > > Device attestation helps with trusting/verifying static information, but > the dynamically changing config space is something different. > > So it seems that such "emulated PCI config misconfiguration attacks" > need to be included in the threat model for pKVM as well, i.e. need to > be hardened on the guest side. Unless we revisit our current design > assumptions for device assignment in pKVM on x86 and manage the physical > PCI config in the trusted hypervisor, not in the host (with all the > increasing complexity that comes with that, related to power management > and other things). Thank you very much for clarification Dmytro on this and many other points when it comes to pKVM. It does help greatly to bring us on the same page. > > Also, thinking more about it: irrespectively of passthrough devices, I > guess that the protected pKVM guest may well want to use virtio with PCI > transport (not for things like networking, but that's not the point), > thus be prone to the same attacks. > > >> The untrusted host schedules its guests on physical CPUs (i.e. the > >> host's L1 vCPUs are 1:1 mapped onto pCPUs), while the trusted hypervisor > >> has no scheduling, it only handles vmexits from the host and guests. The > >> untrusted host fully controls the physical interrupt controllers (I > >> think we realize that is not perfectly fine, but here we are), etc. > > > > Yeah, IRQs are a tough nut to crack. > > And BTW, doesn't it mean that interrupts also need to be hardened in the > guest (if we don't want the complexity of interrupt controllers in the > trusted hypervisor)? At least sensitive ones like IPIs, but I guess we > should also consider interrupt-based timings attacks, which could use > any type of interrupt. (I have no idea how to harden either of the two > cases, but I'm no expert.) We have been thinking about it a bit at least when it comes to our TDX case. Two main issues were identified: interrupts contributing to the state of Linux PRNG [1] and potential implications of missing interrupts for reliable panic and other kernel use cases [2]. [1] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#randomness-inside-tdx-guest [2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#reliable-panic For the first one, in addition to simply enforce usage of RDSEED for TDX guests, we still want to do a proper evaluation of security of Linux PRNG under our threat model. The second one is harder to reliably asses imo, but so far we were not able to find any concrete attack vectors. But it would be good if people who have expertise in this, could take a look on the assessment we did. The logic was to go over all kernel core callers of various smp_call_function*, on_each_cpu* and check the implications if such an IPI is never delivered. Best Regards, Elena.