On Tue, Apr 23, 2024, Wei W Wang wrote: > On Tuesday, April 23, 2024 3:44 AM, Sean Christopherson wrote: > > On Mon, Apr 22, 2024, Wei Wang wrote: > > > If the bits set by userspace to the guest's MSR_IA32_ARCH_CAPABILITIES > > > are not supported by KVM, fails the write. This safeguards against the > > > launch of a guest with a feature set, enumerated via > > > MSR_IA32_ARCH_CAPABILITIES, that surpasses the capabilities supported > > > by KVM. > > > > I'm not entirely certain KVM cares. Similar to guest CPUID, advertising > > features to the guest that are unbeknownst may actually make sense in some > > scenarios, e.g. if userspace learns of yet another "NO" bit that says a > > CPU isn't vulnerable to some flaw. > > I think it might be more appropriate for the guest to see the "NO" bit only when > the host, such as the hardware (i.e., host_arch_capabilities), already supports it. > Otherwise, the guest could be misled by a false "NO" bit. For instance, the guest > might assume it's not vulnerable to a certain flaw as it sees the "NO" bit from the > MSR, even though the enhancement feature isn't actually supported by the host, > and thus bypass a workaround (to the vulnerability) it should have used. This could > arise with a faulty or compromised userspace. > Another scenario pertains to guest live migration: the source platform physically > supports the "NO" bit, but the destination platform does not. If KVM fails the MSR > write here, it could prevent such a live migration from proceeding. > > So I think it might be prudent for KVM to perform this check. This is similar to the > MSR_IA32_PERF_CAPABILITIES case that we have implemented. PERF_CAPABILITIES is a bad example. KVM ended up enforcing the incoming value through a series of fixes, not because of a concious design choice. Though to be fair, we might still have decided to enforce the supported capabilities since KVM heavily consumes PERF_CAPABILITIES. > > ARCH_CAPABILITIES is read-only, i.e. KVM _can't_ shove it into hardware. So > > as long as KVM treats the value as "untrusted", like KVM does for guest CPUID, > > I think the current behavior is actually ok. > > Yes, the value coming from userspace could be considered "untrusted", but should > KVM ensure to expose a trusted/reliable value to the guest? No, the VMM is firmly in the guest's TCB. We have general consensus that KVM should enforce an architecturally consistent model[1] (there was a deeper PUCK discussion on this, I think, but I can't find the notes offhand). But even in that case the reasoning isn't that userspace isn't trusted, it's that trying to allow userspace to do MSR writes that architecturally should fail, while disallowing the same writes from the guest is unnecessarily complex and not maintainable. And, there is no use case for inconsistent setups that is remotely plausible. ARCH_CAPABILITIES is different. Like CPUID, KVM itself isn't negatively affected by userspace enumerating unsupported bits. And like CPUID[2], there are plausible scenarios where enumerating unsupported bits would actually make sense, e.g. if userspace is enumerating a FMS that is not the actual hardware FMS, and based on FMS the guest may incorrectly think it needs to a mitigate a vulnerability that isn't actually relevant. All that said, I'm not completely opposed to enforcing ARCH_CAPABILITIES, but I would prefer to do so if and only if there's an actual benefit/need to do so. [1] https://lore.kernel.org/all/ZfDdS8rtVtyEr0UR@xxxxxxxxxx [2] https://lore.kernel.org/all/ZC4qF90l77m3X1Ir@xxxxxxxxxx