On Mon, Aug 05, 2024, mlevitsk@xxxxxxxxxx wrote: > У пн, 2024-08-05 у 09:39 -0700, Sean Christopherson пише: > > On Mon, Aug 05, 2024, mlevitsk@xxxxxxxxxx wrote: > > > У пт, 2024-08-02 у 08:53 -0700, Sean Christopherson пише: > > > > > > > Checking kvm_cpu_cap_has() is wrong. What the _host_ supports is irrelevant, > > > > > > > what matters is what the guest CPU supports, i.e. this should check guest CPUID. > > > > > > > Ah, but for safety, KVM also needs to check kvm_cpu_cap_has() to prevent faulting > > > > > > > on a bad load into hardware. Which means adding a "governed" feature until my > > > > > > > CPUID rework lands. > > > > > > Well the problem is that we passthrough these MSRs, and that means that the guest > > > can modify them at will, and only ucode can prevent it from doing so. > > > > > > So even if the 5 level paging is disabled in the guest's CPUID, but host supports it, > > > nothing will prevent the guest to write non canonical value to one of those MSRs, > > > and later KVM during migration or just KVM_SET_SREGS will fail. > > > > Ahh, and now I recall the discussions around the virtualization holes with LA57. > > > > > Thus I used kvm_cpu_cap_has on purpose to make KVM follow the actual ucode > > > behavior. > > > > I'm leaning towards having KVM do the right thing when emulation happens to be > > triggered. If KVM checks kvm_cpu_cap_has() instead of guest_cpu_cap_has() (looking > > at the future), then KVM will extend the virtualization hole to MSRs that are > > never passed through, and also to the nested VMX checks. Or I suppose we could > > add separate helpers for passthrough MSRs vs. non-passthrough, but that seems > > like it'd add very little value and a lot of maintenance burden. > > > > Practically speaking, outside of tests, I can't imagine the guest will ever care > > if there is inconsistent behavior with respect to loading non-canonical values > > into MSRs. > > > > Hi, > > If we weren't allowing the guest (and even nested guest assuming that L1 > hypervisor allows it) to write these MSRs directly, I would have agreed with > you, but we do allow this. > > This means that for example a L2, even a malicious L2, can on purpose write > non canonical value to one of these MSRs, and later on, KVM could kill the L0 L1? > due to canonical check. Ugh, right, if L1 manually saves/restores MSRs and happens to trigger emulation on WRMSR at the 'wrong" time. Host userspace save/restore would suffer the same problem. We could grant host userspace accesses an exception, but that's rather pointless. > Or L1 (not Linux, because it only lets canonical GS_BASE/FS_BASE), allow the > untrusted userspace to write any value to say GS_BASE, thus allowing > malicious L1 userspace to crash L1 (also a security violation). FWIW, I don't think this is possible. WR{FS,GS}BASE and other instructions that load FS/GS.base honor CR4.LA57, it's only WRMSR that does not. > IMHO if we really want to do it right, we need to disable pass-though of > these MSRs if ucode check is more lax than our check, that is if L1 is > running without 5 level paging enabled but L0 does have it supported. > > I don't know if this configuration is common, and thus how much this will > affect performance. MSR_FS_BASE and SR_KERNEL_GS_BASE are hot spots when WR{FS,GS}BASE are unsupported, or if the guest kernels doesn't utilize those instructions. All in all, I agree it's not worth trying to plug the virtualization hole for MSRs, especially since mimicking hardware yields much simpler code overall. E.g. add a dedicated MSR helper, and have that one check kvm_cpu_cap_has(), including in VM-Entry flows, but keep the existing is_noncanonical_address() for all non-WRMSR path. Something like this? static inline bool is_noncanonical_msr_value(u64 la) { u8 virt_addr_bits = kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48; return !__is_canonical_address(la, virt_addr_bits); }