On Tue, Feb 23, 2010 at 11:37:21AM +0100, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Tue, Feb 23, 2010 at 11:10:57AM +0100, Jan Kiszka wrote: > >> Gleb Natapov wrote: > >>> On Mon, Feb 22, 2010 at 06:51:23PM +0100, Jan Kiszka wrote: > >>>> Support both guest- as well as host-owned EFLAGS.TF while emulating > >>>> instructions. For guest-owned TF, we simply inject DB and update DR6.BS > >>>> after completing an instruction that has TF set on entry. To support > >>>> guest single-stepping under host control, we store the pending step > >>>> along with its CS and RIP and trigger a corresponding user space exit > >>>> once guest execution is about to resume. This check is is also required > >>>> in the VMX emulation loop during invalid guest states. > >>>> > >>> Emulator currently is a total mess. It is not a good time to add more mess > >>> there right now IMO. > >> Then let's clean up what you consider "mess" in this feature. Unless > >> there are plans to clean up the emulator for the next or next-but-one > >> kernel release, I do not want to wait for this. > >> > > There are plans to cleanup the emulator. > > When? ASAP :) I am looking into that, but it will not be easy. > > > > >>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > >>>> --- > >>>> arch/x86/include/asm/kvm_host.h | 5 +++ > >>>> arch/x86/kvm/vmx.c | 6 +++ > >>>> arch/x86/kvm/x86.c | 65 ++++++++++++++++++++++++++++++++++----- > >>>> 3 files changed, 68 insertions(+), 8 deletions(-) > >>>> > >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > >>>> index d46e791..d69d8aa 100644 > >>>> --- a/arch/x86/include/asm/kvm_host.h > >>>> +++ b/arch/x86/include/asm/kvm_host.h > >>>> @@ -362,8 +362,11 @@ struct kvm_vcpu_arch { > >>>> u64 *mce_banks; > >>>> > >>>> /* used for guest single stepping over the given code position */ > >>>> + bool singlestep_pending; > >>>> u16 singlestep_cs; > >>>> + u16 singlestep_pending_cs; > >>>> unsigned long singlestep_rip; > >>>> + unsigned long singlestep_pending_rip; > >>> If we are going to have many of those rip/cs pairs may be it is better > >>> to add structure linear_ip and have functions is_same_ip(). > >> Agreed. > >> > >>> > >>>> /* fields used by HYPER-V emulation */ > >>>> u64 hv_vapic; > >>>> }; > >>>> @@ -820,4 +823,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v); > >>>> void kvm_define_shared_msr(unsigned index, u32 msr); > >>>> void kvm_set_shared_msr(unsigned index, u64 val, u64 mask); > >>>> > >>>> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu); > >>>> + > >>>> #endif /* _ASM_X86_KVM_HOST_H */ > >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >>>> index d772476..317828f 100644 > >>>> --- a/arch/x86/kvm/vmx.c > >>>> +++ b/arch/x86/kvm/vmx.c > >>>> @@ -3489,6 +3489,12 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu) > >>>> goto out; > >>>> if (need_resched()) > >>>> schedule(); > >>>> + > >>>> + if (unlikely(vcpu->arch.singlestep_pending)) { > >>>> + ret = kvm_check_guest_singlestep(vcpu); > >>>> + if (ret == 0) > >>>> + goto out; > >>>> + } > >>>> } > >>>> > >>>> vmx->emulation_required = 0; > >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >>>> index 19e8b28..6ebebb9 100644 > >>>> --- a/arch/x86/kvm/x86.c > >>>> +++ b/arch/x86/kvm/x86.c > >>>> @@ -3441,6 +3441,27 @@ static void cache_all_regs(struct kvm_vcpu *vcpu) > >>>> vcpu->arch.regs_dirty = ~0; > >>>> } > >>>> > >>>> +static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg) > >>>> +{ > >>>> + struct kvm_segment kvm_seg; > >>>> + > >>>> + kvm_get_segment(vcpu, &kvm_seg, seg); > >>>> + return kvm_seg.selector; > >>>> +} > >>>> + > >>>> +static void queue_singlestep(struct kvm_vcpu *vcpu) > >>>> +{ > >>>> + if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) { > >>>> + vcpu->arch.singlestep_pending = true; > >>>> + vcpu->arch.singlestep_pending_cs = > >>>> + get_segment_selector(vcpu, VCPU_SREG_CS); > >>>> + vcpu->arch.singlestep_pending_rip = kvm_rip_read(vcpu); > >>> Why should we remember rip where TF happened? We should exit > >>> immediately to userspace anyway, no? > >> I think MMIO exits takes precedence, so this is intended to exit after > >> they completed, ie. after the instruction is fully finished. > >> > >>>> + } else { > >>>> + vcpu->arch.dr6 |= DR6_BS; > >>>> + kvm_queue_exception(vcpu, DB_VECTOR); > >>> What if instruction emulation generated fault? > >> Fault-like exceptions will trigger before that, and the instruction > >> won't complete. Do we have any trap-like exceptions to worry about? > >> > > They will not trigger before that. They will be queued for the next > > entry and queuing another one will either overwrite the previous one, > > or will queue double fault (depending on what what the first exception). > > The will not stack as the instruction failed, thus no singlestep will be > queued as well. Instruction failed doesn't mean emulation failed, so lets see what happens when you single step over instruction that generates page fault. #PF is queued and x86_emulate_insn() returns 0 to emulate_instruction() no you call queue_singlestep() which calls kvm_queue_exception(vcpu, DB_VECTOR); and this cause #DF to be injected. > > > > >>>> + } > >>>> +} > >>>> + > >>>> int emulate_instruction(struct kvm_vcpu *vcpu, > >>>> unsigned long cr2, > >>>> u16 error_code, > >>>> @@ -3449,6 +3470,7 @@ int emulate_instruction(struct kvm_vcpu *vcpu, > >>>> int r, shadow_mask; > >>>> struct decode_cache *c; > >>>> struct kvm_run *run = vcpu->run; > >>>> + bool singlestep; > >>>> > >>>> kvm_clear_exception_queue(vcpu); > >>>> vcpu->arch.mmio_fault_cr2 = cr2; > >>>> @@ -3515,8 +3537,12 @@ int emulate_instruction(struct kvm_vcpu *vcpu, > >>>> } > >>>> } > >>>> > >>>> + singlestep = vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_TF; > >>>> + > >>>> if (emulation_type & EMULTYPE_SKIP) { > >>>> kvm_rip_write(vcpu, vcpu->arch.emulate_ctxt.decode.eip); > >>>> + if (singlestep) > >>>> + queue_singlestep(vcpu); > >>> Instruction that wasn't emulated shouldn't generate faults. > >>> > >> Skipping here doesn't mean it's not emulated. A valid question might be > >> if we should catch it here or in skip_emulated_instruction. > > In skip_emulated_instruction() or even above that. Only at the point we > > are sure we actually emulate instruction. > > > >>>> return EMULATE_DONE; > >>>> } > >>>> > >>>> @@ -3549,6 +3575,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu, > >>>> > >>>> kvm_x86_ops->set_rflags(vcpu, vcpu->arch.emulate_ctxt.eflags); > >>>> > >>>> + if (singlestep) > >>>> + queue_singlestep(vcpu); > >>>> + > >>> if vcpu->mmio_is_write == true we can still exit with DO_MMIO, so > >>> instruction is not yes completely executed. This queue_singlestep > >>> mechanism looks bogus anyway. emulate_instruction() caller should > >>> initiate exit to userspace space if required. > >> That's while it is _queued_, not immediately delivered: MMIO exits will > >> continue to take precedence. > > That is bogus. We should queue TF only after instruction is completely This should have been "We should queue #DB ..". > > emulated, not during emulation. Instruction may generate dozen MMIO > > exits and eventually be aborted by exception, so DB shouldn't be generated > > at all in this case. > > How to detect this? This potential looping over user space caused sever > headache will designing the TF feature. I thought I got all cases. > How to detect that emulation is complete? emulate_instruction() should return EMULATE_DONE in this case. > The current model is: if we first execute a different instruction than > the one that singlestep targets at, singlestep should be cleared without > having any effect. Re-executing the stepped instruction after a > potential exception resolution should then reassert singlestep and > requeue it properly. > > > > >>>> if (vcpu->mmio_is_write) { > >>>> vcpu->mmio_needed = 0; > >>>> return EMULATE_DO_MMIO; > >>>> @@ -4450,6 +4479,26 @@ out: > >>>> return r; > >>>> } > >>>> > >>>> +int kvm_check_guest_singlestep(struct kvm_vcpu *vcpu) > >>>> +{ > >>>> + unsigned long rip = kvm_rip_read(vcpu); > >>>> + > >>>> + vcpu->arch.singlestep_pending = false; > >>>> + > >>>> + if (vcpu->arch.singlestep_pending_cs != > >>>> + get_segment_selector(vcpu, VCPU_SREG_CS) || > >>>> + vcpu->arch.singlestep_pending_rip != rip) > >>>> + return 1; > >>>> + > >>> Again how this check can be false? > >> E.g. someone fiddled with the VCPU state, resetting the guest. > > How is this someone? It should reset this state too then. > > If somehow possible, I do not want to make this state part of the user > visible VCPU state but rather translate it into a pending #DB + set > DR6.BS (for guest-owned TF) or drop it (for host-owned - not critical here). As far as I can see for guest owned TF you always inject #DB immediately in queue_singlestep(). And I think this should always be the case. At the end of instruction emulator should inject #DB. When we enter vcpu or emulator again if #DB interception is enables and #DB is pending intercept function should be called. > > > > >>>> + vcpu->run->debug.arch.dr6 = DR6_BS | DR6_FIXED_1; > >>>> + vcpu->run->debug.arch.dr7 = 0; > >>>> + vcpu->run->exit_reason = KVM_EXIT_DEBUG; > >>>> + vcpu->run->debug.arch.pc = get_segment_base(vcpu, VCPU_SREG_CS) + rip; > >>>> + vcpu->run->debug.arch.exception = DB_VECTOR; > >>>> + > >>>> + return 0; > >>>> +} > >>>> +EXPORT_SYMBOL_GPL(kvm_check_guest_singlestep); > >>>> > >>>> static int __vcpu_run(struct kvm_vcpu *vcpu) > >>>> { > >>>> @@ -4471,6 +4520,12 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) > >>>> > >>>> r = 1; > >>>> while (r > 0) { > >>>> + if (unlikely(vcpu->arch.singlestep_pending)) { > >>>> + r = kvm_check_guest_singlestep(vcpu); > >>>> + if (r == 0) > >>>> + break; > >>>> + } > >>>> + > >>> Why not use existing mechanism to cause run loop to exit to userspace > >>> i.e return 0 from vcpu_enter_guest(), instead of adding special cases here? > >>> > >> I wanted to exit in case of vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE > >> as well, but thinking about this again it's actually more reasonable to > >> exit once the VCPU unblocks again, e.g. once halt resumes. > >> > > We will exit immediately after halt if return value is 0, why would you > > want to exit on vcpu entry after that? > > As I said: I will move this into vcpu_enter_guest. > > Jan > > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html