Re: [PATCH v2 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/30/2012 04:01 PM, Kevin Wolf wrote:
> Am 30.01.2012 14:23, schrieb Avi Kivity:
> > On 01/30/2012 01:05 PM, Kevin Wolf wrote:
> >> Am 30.01.2012 11:24, schrieb Avi Kivity:
> >>> On 01/27/2012 09:23 PM, Kevin Wolf wrote:
> >>>> Task switches can switch between Protected Mode and VM86. The current
> >>>> mode must be updated during the task switch emulation so that the new
> >>>> segment selectors are interpreted correctly and privilege checks
> >>>> succeed.
> >>>>
> >>>> VMX code calculates the CPL from the code segment selector and rflags,
> >>>> so it needs rflags to be updated in the vcpu struct. SVM stores the DPL
> >>>> of the code segment instead, so we must be sure to give the right one
> >>>> when updating the selector.
> >>>
> >>> svm has vmcb_save_area::cpl; it's independent of CS.RPL.
> >>
> >> Right, but cpl in the VMCB is updated when you reload a segment (and I
> >> think only then), 
> > 
> > Gah, it's broken.  It should be qualified by more - real mode should
> > keep cpl at zero, vm86 at 3.  And setting rflags.vm86 should update cpl
> > as well.
> > 
> > For example, live migration while in real mode with cs&3!=0 or in vm86
> > mode with cs&3==0 would set the wrong cpl.
>
> Ah, right, I didn't think of this case.
>
> Not sure if I should fix it in this series. 

No need.  But we need to have a clear picture of where we're going so we
move in the right direction.  And that direction is cpl decoupled from
cs.rpl/ss.rpl/cr0.pe/eflags.vm (but changes in response to them, except
cr0.pe).

Note that we don't expose cpl as separate state for save/restore, so the
kvm ABI follows vmx instead of svm (unfortunately).

> If we do the right fixes to
> the task switch, it won't be strictly needed for the bug I'm fixing.

Right.

> >>
> >> 3. Load segment descriptors, and disable the privilege checks in
> >>    load_segment_descriptor using a new flag. 
> > 
> > I don't like doing things that aren't directly traceable to the SDM. 
> > Perhaps we should pass the cpl as a parameter to
> > load_segment_descriptor().  Or we should ->set_cpl() just before.
>
> I like the idea of having a ->set_cpl(). For SVM it should be trivial to
> implement.
>
> For VMX, I think instead of clearing VCPU_EXREG_CPL, vcpu_vmx->cpl
> should directly be updated in set_rflags and set_segment (and in the new
> set_cpl, obviously).

vcpu_vmx::cpl is currently just a cache, when valid it reflects the
state of cr0.pe/eflags.vm/cs.rpl.  It's not separate state as the VMCS
has nowhere to hold it.  vmx cannot virtualize the sequence 'mov $3,
%ss; mov $1, %eax; mov %eax, %cr0; nop' - we have
emulate_invalid_guest_state for that, but it's not on by default.

What you propose is converting vcpu_vmx::cpl from a cache to separate
state, but that state can change after a vmexit.  It's not entirely
clear when it's valid and when it isn't.

> Would that be enough or would we have to avoid clearing it in all other
> places as well? Where would it be initialised if it's not enough?

Maybe vmx_vcpu_reset().

>
> >> For SVM, this updates
> >>    the CPL when cs is reloaded.
> > 
> > 
> > 
> >>
> >> 4. Call ctxt->ops.set_rflag so that VMX updates the CPL. Should be
> >>    cleaned up in a follow-up patch, so that VMX uses set_segment
> >>    to update the CPL, like SVM does.
> >>
> >> Would this match your interpretation?
> > 
> > Not exactly.  I claim that the cpl is a separate state from
> > cs.cpl/ss.rpl/cr0.pe/eflags.vm - it cannot be derived just from those. 
> > On the transition from real mode to protected mode, you can have (say)
> > cs=0x13 and cpl=0.
>
> But even with cs=0x13 cs.dpl would still be 0 (in the segment cache)
> before cs is reloaded and you switch to a real protected mode segment,
> right?

Right.  But cpl is defined as cs.rpl, not cs.dpl, and I think there are
cases where cs.rpl != cs.dpl.

> > Outside of mode switches, it's good to have set_segment() update the cpl
> > automatically.  But inside mode switches it can screw up further checks
> > or aborts.
>
> Can you give a specific example of how it screws things up?

KVM_SET_REGS/KVM_SET_SREGS happen non-atomically, so you might have
KVM_SET_SREGS raising the CPL to 3 only to lower it afterwards.  Things
in the middle happen with the wrong CPL.  Not sure if anything actually
checks it there.

The other case is what we're looking at, task switch.  To actually
update cpl, set_segment() needs to look at cr0.pe and eflags, but these
might not have been committed yet.  It's all solvable but the solution
involves knowing exactly what kvm and the emulator depend on, which
makes it very fragile.  I think giving the emulator less complicated
primitives will make it more readable.

>
> >>>>  static int emulator_get_cpl(struct x86_emulate_ctxt *ctxt)
> >>>>  {
> >>>>  	return kvm_x86_ops->get_cpl(emul_to_vcpu(ctxt));
> >>>> @@ -4199,6 +4204,7 @@ static struct x86_emulate_ops emulate_ops = {
> >>>>  	.set_idt	     = emulator_set_idt,
> >>>>  	.get_cr              = emulator_get_cr,
> >>>>  	.set_cr              = emulator_set_cr,
> >>>> +	.set_rflags          = emulator_set_rflags,
> >>>>  	.cpl                 = emulator_get_cpl,
> >>>>  	.get_dr              = emulator_get_dr,
> >>>>  	.set_dr              = emulator_set_dr,
> >>>
> >>> It would be good to switch the entire emulator to use ->set_rflags()
> >>> instead of sometimes letting the caller do it.
> >>
> >> If we change the CPL only with a cs reload, set_rflags can be dropped
> >> completely in the end.
> > 
> > That's attractive, yes.
> > 
> > Suppose load_segment_descriptor(..., VCPU_SREG_LDTR) fails.  What should
> > the new cpl be?  Does it matter?
>
> I think the one matching the new cs/eflags. The SDM says that the task
> switch is completed without further segment checks and then an exception
> is thrown.

My bet is that it will be 3 if eflags.vm=1 and unchanged otherwise --
the cpl update happens when the segment cache is updated.  But that's
just a guess.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux