Am 30.01.2012 15:32, schrieb Avi Kivity: > On 01/30/2012 04:01 PM, Kevin Wolf wrote: >> Am 30.01.2012 14:23, schrieb Avi Kivity: >>> On 01/30/2012 01:05 PM, Kevin Wolf wrote: >>>> Am 30.01.2012 11:24, schrieb Avi Kivity: >>>>> On 01/27/2012 09:23 PM, Kevin Wolf wrote: >>>>>> Task switches can switch between Protected Mode and VM86. The current >>>>>> mode must be updated during the task switch emulation so that the new >>>>>> segment selectors are interpreted correctly and privilege checks >>>>>> succeed. >>>>>> >>>>>> VMX code calculates the CPL from the code segment selector and rflags, >>>>>> so it needs rflags to be updated in the vcpu struct. SVM stores the DPL >>>>>> of the code segment instead, so we must be sure to give the right one >>>>>> when updating the selector. >>>>> >>>>> svm has vmcb_save_area::cpl; it's independent of CS.RPL. >>>> >>>> Right, but cpl in the VMCB is updated when you reload a segment (and I >>>> think only then), >>> >>> Gah, it's broken. It should be qualified by more - real mode should >>> keep cpl at zero, vm86 at 3. And setting rflags.vm86 should update cpl >>> as well. >>> >>> For example, live migration while in real mode with cs&3!=0 or in vm86 >>> mode with cs&3==0 would set the wrong cpl. >> >> Ah, right, I didn't think of this case. >> >> Not sure if I should fix it in this series. > > No need. But we need to have a clear picture of where we're going so we > move in the right direction. And that direction is cpl decoupled from > cs.rpl/ss.rpl/cr0.pe/eflags.vm (but changes in response to them, except > cr0.pe). > > Note that we don't expose cpl as separate state for save/restore, so the > kvm ABI follows vmx instead of svm (unfortunately). Which means that restoring a VM that was in the middle of a RM -> PM mode switch will result in corrupted state. Probably not a huge problem in practice, but can we fail the save? >> If we do the right fixes to >> the task switch, it won't be strictly needed for the bug I'm fixing. > > Right. > >>>> >>>> 3. Load segment descriptors, and disable the privilege checks in >>>> load_segment_descriptor using a new flag. >>> >>> I don't like doing things that aren't directly traceable to the SDM. >>> Perhaps we should pass the cpl as a parameter to >>> load_segment_descriptor(). Or we should ->set_cpl() just before. >> >> I like the idea of having a ->set_cpl(). For SVM it should be trivial to >> implement. >> >> For VMX, I think instead of clearing VCPU_EXREG_CPL, vcpu_vmx->cpl >> should directly be updated in set_rflags and set_segment (and in the new >> set_cpl, obviously). > > vcpu_vmx::cpl is currently just a cache, when valid it reflects the > state of cr0.pe/eflags.vm/cs.rpl. It's not separate state as the VMCS > has nowhere to hold it. vmx cannot virtualize the sequence 'mov $3, > %ss; mov $1, %eax; mov %eax, %cr0; nop' - we have > emulate_invalid_guest_state for that, but it's not on by default. > > What you propose is converting vcpu_vmx::cpl from a cache to separate > state, but that state can change after a vmexit. It's not entirely > clear when it's valid and when it isn't. It's the only way I can imagine to get a set_cpl() callback. Do you have another one? That it's not entirely clear where it's valid and where it needs to be set is exactly the problem I have with it. Probably even more than you as I'm not very familiar with the KVM code. >> Would that be enough or would we have to avoid clearing it in all other >> places as well? Where would it be initialised if it's not enough? > > Maybe vmx_vcpu_reset(). Do all CPL changes go through set_cr0/segment/rflags/cpl? I guess yes, so initialising on reset and keeping it valid all the time should be possible indeed. >>>> For SVM, this updates >>>> the CPL when cs is reloaded. >>> >>> >>> >>>> >>>> 4. Call ctxt->ops.set_rflag so that VMX updates the CPL. Should be >>>> cleaned up in a follow-up patch, so that VMX uses set_segment >>>> to update the CPL, like SVM does. >>>> >>>> Would this match your interpretation? >>> >>> Not exactly. I claim that the cpl is a separate state from >>> cs.cpl/ss.rpl/cr0.pe/eflags.vm - it cannot be derived just from those. >>> On the transition from real mode to protected mode, you can have (say) >>> cs=0x13 and cpl=0. >> >> But even with cs=0x13 cs.dpl would still be 0 (in the segment cache) >> before cs is reloaded and you switch to a real protected mode segment, >> right? > > Right. But cpl is defined as cs.rpl, not cs.dpl, and I think there are > cases where cs.rpl != cs.dpl. Sounds like conforming code segments? A part of protected mode magic that I've never touched and never intended to... But using the RPL instead makes sense in that context. So SVM isn't only wrong in not considering real mode and VM86, but also in calculating CPL from vmcb->save.cs.attrib rather than from the RPL in the selector? >>> Outside of mode switches, it's good to have set_segment() update the cpl >>> automatically. But inside mode switches it can screw up further checks >>> or aborts. >> >> Can you give a specific example of how it screws things up? > > KVM_SET_REGS/KVM_SET_SREGS happen non-atomically, so you might have > KVM_SET_SREGS raising the CPL to 3 only to lower it afterwards. Things > in the middle happen with the wrong CPL. Not sure if anything actually > checks it there. > > The other case is what we're looking at, task switch. To actually > update cpl, set_segment() needs to look at cr0.pe and eflags, but these > might not have been committed yet. It's all solvable but the solution > involves knowing exactly what kvm and the emulator depend on, which > makes it very fragile. I think giving the emulator less complicated > primitives will make it more readable. I think the main problem here is that you have two sets of registers, one in the vcpu struct and one in the emulator context. >>>>>> static int emulator_get_cpl(struct x86_emulate_ctxt *ctxt) >>>>>> { >>>>>> return kvm_x86_ops->get_cpl(emul_to_vcpu(ctxt)); >>>>>> @@ -4199,6 +4204,7 @@ static struct x86_emulate_ops emulate_ops = { >>>>>> .set_idt = emulator_set_idt, >>>>>> .get_cr = emulator_get_cr, >>>>>> .set_cr = emulator_set_cr, >>>>>> + .set_rflags = emulator_set_rflags, >>>>>> .cpl = emulator_get_cpl, >>>>>> .get_dr = emulator_get_dr, >>>>>> .set_dr = emulator_set_dr, >>>>> >>>>> It would be good to switch the entire emulator to use ->set_rflags() >>>>> instead of sometimes letting the caller do it. >>>> >>>> If we change the CPL only with a cs reload, set_rflags can be dropped >>>> completely in the end. >>> >>> That's attractive, yes. >>> >>> Suppose load_segment_descriptor(..., VCPU_SREG_LDTR) fails. What should >>> the new cpl be? Does it matter? >> >> I think the one matching the new cs/eflags. The SDM says that the task >> switch is completed without further segment checks and then an exception >> is thrown. > > My bet is that it will be 3 if eflags.vm=1 and unchanged otherwise -- > the cpl update happens when the segment cache is updated. But that's > just a guess. Does even anyone see the new CPL in error cases? An exception is thrown immediately, so cs is reloaded and we get an even newer CPL. So to take any notice of the CPL, the "complete the task switch" part would have to fail a privilege check. The one thing that comes to mind is that pushing an error code could fail, but I don't think this is considered part of the task switch. Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html