On Tue, 2024-07-09 at 12:58 -0700, Sean Christopherson wrote: > On Thu, Jul 04, 2024, Maxim Levitsky wrote: > > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote: > > > Drop x86.c's local pre-computed cr4_reserved bits and instead fold KVM's > > > reserved bits into the guest's reserved bits. This fixes a bug where VMX's > > > set_cr4_guest_host_mask() fails to account for KVM-reserved bits when > > > deciding which bits can be passed through to the guest. In most cases, > > > letting the guest directly write reserved CR4 bits is ok, i.e. attempting > > > to set the bit(s) will still #GP, but not if a feature is available in > > > hardware but explicitly disabled by the host, e.g. if FSGSBASE support is > > > disabled via "nofsgsbase". > > > > > > Note, the extra overhead of computing host reserved bits every time > > > userspace sets guest CPUID is negligible. The feature bits that are > > > queried are packed nicely into a handful of words, and so checking and > > > setting each reserved bit costs in the neighborhood of ~5 cycles, i.e. the > > > total cost will be in the noise even if the number of checked CR4 bits > > > doubles over the next few years. In other words, x86 will run out of CR4 > > > bits long before the overhead becomes problematic. > > > > It might be just me, but IMHO this justification is confusing, leading me to > > belive that maybe the code is on the hot-path instead. > > > > The right justification should be just that this code is in > > kvm_vcpu_after_set_cpuid is usually (*) only called once per vCPU (twice > > after your patch #1) > > Ya. I was trying to capture that even if that weren't true, i.e. even if userspace > was doing something odd, that the extra cost is irrelevant. I'll expand and reword > the paragraph to make it clear this isn't a hot path for any sane userspace. Thank you! > > > (*) Qemu also calls it, each time vCPU is hotplugged but this doesn't change > > anything performance wise. > > ... > > > > @@ -9831,10 +9826,6 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) > > > if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES)) > > > kvm_caps.supported_xss = 0; > > > > > > -#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f) > > > - cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_); > > > -#undef __kvm_cpu_cap_has > > > - > > > if (kvm_caps.has_tsc_control) { > > > /* > > > * Make sure the user can only configure tsc_khz values that > > > > I mostly agree with this patch - caching always carries risks and when it doesn't > > value performance wise, it should always be removed. > > > > > > However I don't think that this patch fixes a bug as it claims: > > > > This is the code prior to this patch: > > > > kvm_x86_vendor_init -> > > > > r = ops->hardware_setup(); > > svm_hardware_setup > > svm_set_cpu_caps + kvm_set_cpu_caps > > > > -- or -- > > > > vmx_hardware_setup -> > > vmx_set_cpu_caps + + kvm_set_cpu_caps > > > > > > # read from 'kvm_cpu_caps' > > cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_); > > > > > > AFAIK kvm cpu caps are never touched outside of svm_set_cpu_caps/vmx_hardware_setup > > (they don't depend on some later post-processing, cpuid, etc). > > > > In fact a good refactoring would to make kvm_cpu_caps const after this point, > > using cast, assert or something like that. > > > > This leads me to believe that cr4_reserved_bits is computed correctly. > > cr4_reserved_bits is computed correctly. The bug is that cr4_reserved_bits isn't > consulted by set_cr4_guest_host_mask(), which is what I meant by "KVM-reserved > bits" in the changelog. Ah, I see it now. I also see that set_cr4_guest_host_mask, limits the guest owned bits to a small whitelist, and none of these bits looks scary, so it all make sense that this is mostly a theoretical bug, but for sure worth fixing. > > > I could be wrong, but then IMHO it is a very good idea to provide an explanation > > on how this bug can happen. > > The first paragraph of the changelog tries to do that, and I'm struggling to come > up with different wording that makes it more clear what's wrong. Any ideas/suggestions? > I also re-read it, and now it all makes sense. I guess I just somehow got fixed on thinking that cr4_reserved_bits was not computed incorrectly rather than just not used. The comment indeed now makes sense to me, so let it be as it is. Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Best regards, Maxim Levitsky