On Fri, May 03, 2024, Mickaël Salaün wrote: > Add an interface for user space to be notified about guests' Heki policy > and related violations. > > Extend the KVM_ENABLE_CAP IOCTL with KVM_CAP_HEKI_CONFIGURE and > KVM_CAP_HEKI_DENIAL. Each one takes a bitmask as first argument that can > contains KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4. The > returned value is the bitmask of known Heki exit reasons, for now: > KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4. > > If KVM_CAP_HEKI_CONFIGURE is set, a VM exit will be triggered for each > KVM_HC_LOCK_CR_UPDATE hypercalls according to the requested control > register. This enables to enlighten the VMM with the guest > auto-restrictions. > > If KVM_CAP_HEKI_DENIAL is set, a VM exit will be triggered for each > pinned CR violation. This enables the VMM to react to a policy > violation. > > Cc: Borislav Petkov <bp@xxxxxxxxx> > Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > Cc: H. Peter Anvin <hpa@xxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > Cc: Madhavan T. Venkataraman <madvenka@xxxxxxxxxxxxxxxxxxx> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Sean Christopherson <seanjc@xxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> > Cc: Wanpeng Li <wanpengli@xxxxxxxxxxx> > Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxx> > Link: https://lore.kernel.org/r/20240503131910.307630-4-mic@xxxxxxxxxxx > --- > > Changes since v1: > * New patch. Making user space aware of Heki properties was requested by > Sean Christopherson. No, I suggested having userspace _control_ the pinning[*], not merely be notified of pinning. : IMO, manipulation of protections, both for memory (this patch) and CPU state : (control registers in the next patch) should come from userspace. I have no : objection to KVM providing plumbing if necessary, but I think userspace needs to : to have full control over the actual state. : : One of the things that caused Intel's control register pinning series to stall : out was how to handle edge cases like kexec() and reboot. Deferring to userspace : means the kernel doesn't need to define policy, e.g. when to unprotect memory, : and avoids questions like "should userspace be able to overwrite pinned control : registers". : : And like the confidential VM use case, keeping userspace in the loop is a big : beneifit, e.g. the guest can't circumvent protections by coercing userspace into : writing to protected memory. I stand by that suggestion, because I don't see a sane way to handle things like kexec() and reboot without having a _much_ more sophisticated policy than would ever be acceptable in KVM. I think that can be done without KVM having any awareness of CR pinning whatsoever. E.g. userspace just needs to ability to intercept CR writes and inject #GPs. Off the cuff, I suspect the uAPI could look very similar to MSR filtering. E.g. I bet userspace could enforce MSR pinning without any new KVM uAPI at all. [*] https://lore.kernel.org/all/ZFUyhPuhtMbYdJ76@xxxxxxxxxx