Re: [PATCH v3 1/4] KVM: x86: relax canonical check for some x86 architectural msrs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



У вт, 2024-08-20 у 15:13 +0300, mlevitsk@xxxxxxxxxx пише:
> У пт, 2024-08-16 у 14:49 -0700, Sean Christopherson пише:
> > > > > On Thu, Aug 15, 2024, Maxim Levitsky wrote:
> > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > > > index ce7c00894f32..2e83f7d74591 100644
> > > > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > > > @@ -302,6 +302,31 @@ const struct kvm_stats_header kvm_vcpu_stats_header = {
> > > > > > > > >                        sizeof(kvm_vcpu_stats_desc),
> > > > > > > > >  };
> > > > > > > > >  
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Most x86 arch MSR values which contain linear addresses like
> > > > > 
> > > > > Is it most, or all?  I'm guessing all?
> 
> I can't be sure that all of them are like that - there could be some outliers that behave differently.
> 
> One of the things my work at Intel taught me is that there is nothing consistent
> in x86 spec, anything is possible and nothing can be assumed.
> 
> I dealt only with those msrs, that KVM checks for canonicality, therefore I use the word 
> 'most'. There could be other msrs that are not known to me and/or to KVM.
> 
> I can write 'some' if you prefer.

Hi,


So I did some more reverse engineering and indeed, 'some' is the right word:

I audited all places in KVM which check an linear address for being canonical and this is what I found:

- MSR_IA32_BNDCFGS - since it is not supported on CPUs with 5 level paging, its not possible to know
  what the hardware does.


- MSR_IA32_DS_AREA: - Ignores CR4.LA57 as expected. Tested by booting into kernel with 5 level paging
  disabled and then using userspace 'wrmsr' program to set this msr.
  I attached the bash script that I used

- MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: - Exactly the same story, but for some reason the
  host doesn't suport (not even read) from MSR_IA32_RTIT_ADDR2_*, MSR_IA32_RTIT_ADDR3_*.
  Probably the system is not new enough for these.


- invpcid instruction. It is exposed to the guest without interception (unless !npt or !ept),
  and yes, it works just fine on 57-canonical address without CR4.LA57 set....


- invvpid - this one belongs to VMX set, so technically its for nesting although it is run by L1,
  it is always emulated by KVM, but still executed on the host just with different vpid,
  so I booted the host without 5 level paging, and patched KVM to avoid canonical check.

  Also 57-canonical adddress worked just fine, and fully non canonical address failed.
  and gave a warning in 'invvpid_error'


Should I fix all of these too?


About fixing the emulator this is what see:

	emul_is_noncanonical_address
		__load_segment_descriptor
			load_segment_descriptor
				em_lldt
				em_ltr

		em_lgdt_lidt



While em_lgdt_lidt should be easy to fix because it calls emul_is_noncanonical_address
directly, the em_lldt, em_ltr will be harder because these use load_segment_descriptor
which calls __load_segment_descriptor which in turn is also used for emulating of far jumps/calls/rets,
for which I do believe that canonical check does respect CR4.LA57, but can't be sure either.

It is possible that far jumps/calls/rets also ignore CR4.LA57, and instead set RIP to
non canonical instruction, and then on first fetch, #GP happens.

I'll setup another unit test for this. RIP of the #GP will determine if the instruction
failed or the next fetch.

Best regards,
	Maxim Levitsky


> 
> > > > > 
> > > > > > > > > + * segment bases, addresses that are used in instructions (e.g SYSENTER),
> > > > > > > > > + * have static canonicality checks,
> > > > > 
> > > > > Weird and early line breaks.
> > > > > 
> > > > > How about this?
> > > > > 
> > > > > /*
> > > > >  * The canonicality checks for MSRs that hold linear addresses, e.g. segment
> > > > >  * bases, SYSENTER targets, etc., are static, in the sense that they are based
> > > > >  * on CPU _support_ for 5-level paging, not the state of CR4.LA57.
> > > > > 
> > > > > > > > > + * size of whose depends only on CPU's support for 5-level
> > > > > > > > > + * paging, rather than state of CR4.LA57.
> > > > > > > > > + *
> > > > > > > > > + * In addition to that, some of these MSRS are directly passed
> > > > > > > > > + * to the guest (e.g MSR_KERNEL_GS_BASE) thus even if the guest
> > > > > > > > > + * doen't have LA57 enabled in its CPUID, for consistency with
> > > > > > > > > + * CPUs' ucode, it is better to pivot the check around host
> > > > > > > > > + * support for 5 level paging.
> > > > > 
> > > > > I think we should elaborate on why it's better.  It only takes another line or
> > > > > two, and that way we don't forget the edge cases that make properly emulating
> > > > > guest CPUID a bad idea.
> 
> OK, will do.
> 
> > > > > 
> > > > >  * This creates a virtualization hole where a guest writes to passthrough MSRs
> > > > >  * may incorrectly succeed if the CPU supports LA57, but the vCPU does not
> > > > >  * (because hardware has no awareness of guest CPUID).  Do not try to plug this
> > > > >  * hole, i.e. emulate the behavior for intercepted accesses, as injecting #GP
> > > > >  * depending on whether or not KVM happens to emulate a WRMSR would result in
> > > > >  * non-deterministic behavior, and could even allow L2 to crash L1, e.g. if L1
> > > > >  * passes through an MSR to L2, and then tries to save+restore L2's value.
> > > > >  */
> > > > > 
> > > > > > > > > +
> > > > > > > > > +static u8  max_host_supported_virt_addr_bits(void)
> > > > > 
> > > > > Any objection to dropping the "supported", i.e. going with max_host_virt_addr_bits()?
> > > > > Mostly to shorten the name, but also because "supported" suggests there's software
> > > > > involvement, e.g. the max supported by the kernel/KVM, which isn't the case.
> 
> Doesn't matter to me.
> 
> > > > > 
> > > > > If you're ok with the above, I'll fixup when applying.
> > > > > 
> 
> Best regards,
>    Maxim Levitsky






[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux