Re: [Qemu-devel] KVM: Windows 64-bit troubles with user space irqchip

Gleb Natapov <gleb@xxxxxxxxxx> · Wed, 2 Feb 2011 18:39:22 +0200



On Wed, Feb 02, 2011 at 05:36:53PM +0100, Jan Kiszka wrote:
> On 2011-02-02 17:29, Gleb Natapov wrote:
> > On Wed, Feb 02, 2011 at 04:52:11PM +0100, Jan Kiszka wrote:
> >> On 2011-02-02 16:46, Gleb Natapov wrote:
> >>> On Wed, Feb 02, 2011 at 04:35:25PM +0100, Jan Kiszka wrote:
> >>>> On 2011-02-02 16:09, Avi Kivity wrote:
> >>>>> On 02/02/2011 04:52 PM, Jan Kiszka wrote:
> >>>>>> On 2011-02-02 15:43, Jan Kiszka wrote:
> >>>>>>>  On 2011-02-02 15:35, Avi Kivity wrote:
> >>>>>>>>  On 02/02/2011 04:30 PM, Jan Kiszka wrote:
> >>>>>>>>>  On 2011-02-02 14:05, Avi Kivity wrote:
> >>>>>>>>>>   On 02/02/2011 02:50 PM, Jan Kiszka wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>    Opps, -smp 1. With -smp 2 it boot almost completely and then hangs.
> >>>>>>>>>>>
> >>>>>>>>>>>   Ah, good (or not good). With Windows 2003 Server, I actually get a Blue
> >>>>>>>>>>>   Screen (Stop 0x000000b8).
> >>>>>>>>>>
> >>>>>>>>>>   Userspace APIC is broken since it may run with an outdated cr8, does
> >>>>>>>>>>   reverting 27a4f7976d5 help?
> >>>>>>>>>
> >>>>>>>>>  Can you elaborate on what is broken? The way hw/apic.c maintains the
> >>>>>>>>>  tpr? Would it make sense to compare this against the in-kernel model? Or
> >>>>>>>>>  do you mean something else?
> >>>>>>>>
> >>>>>>>>  The problem, IIRC, was that we look up the TPR but it may already have
> >>>>>>>>  been changed by the running vcpu.  Not 100% sure.
> >>>>>>>>
> >>>>>>>>  If that is indeed the problem then the fix would be to process the APIC
> >>>>>>>>  in vcpu context (which is what the kernel does - we set a bit in the IRR
> >>>>>>>>  and all further processing is synchronous).
> >>>>>>>
> >>>>>>>  You mean: user space changes the tpr value while the vcpu is in KVM_RUN,
> >>>>>>>  then we return from the kernel and overwrite the tpr in the apic with
> >>>>>>>  the vcpu's view, right?
> >>>>>>
> >>>>>> Hmm, probably rather that there is a discrepancy between tpr and irr.
> >>>>>> The latter is changed asynchronously /wrt to the vcpu, the former /wrt
> >>>>>> the user space device model.
> >>>>>
> >>>>> And yet, both are synchronized via qemu_mutex.  So we're still missing 
> >>>>> something in this picture.
> >>>>>
> >>>>>> Run apic_set_irq on the vcpu?
> >>>>>
> >>>>> static void apic_set_irq(APICState *s, int vector_num, int trigger_mode)
> >>>>> {
> >>>>>      apic_irq_delivered += !get_bit(s->irr, vector_num);
> >>>>>
> >>>>>      trace_apic_set_irq(apic_irq_delivered);
> >>>>>
> >>>>>      set_bit(s->irr, vector_num);
> >>>>>
> >>>>> This is even more async with kernel irqchip
> >>>>>
> >>>>>      if (trigger_mode)
> >>>>>          set_bit(s->tmr, vector_num);
> >>>>>      else
> >>>>>          reset_bit(s->tmr, vector_num);
> >>>>>
> >>>>> This is protected by qemu_mutex
> >>>>>
> >>>>>      apic_update_irq(s);
> >>>>>
> >>>>> This will be run the next time the vcpu exits, via apic_get_interrupt().
> >>>>
> >>>> The decision to pend an IRQ (and potentially kick the vcpu) takes place
> >>>> immediately in acip_update_irq. And it is based on current irr as well
> >>>> as tpr. But we update again when user space returns with a new value.
> >>>>
> >>>>>
> >>>>> }
> >>>>>
> >>>>> Did you check whether reverting that commit helps?
> >>>>>
> >>>>
> >>>> Just did so, and I can no longer reproduce the problem. Hmm...
> >>>>
> >>> If there is no problem in the logic of this commit (and I do not see
> >>> one yet) then we somewhere miss kicking vcpu when interrupt, that should be
> >>> handled, arrives?
> >>
> >> I'm not yet confident about the logic of the kernel patch: mov to cr8 is
> >> serializing. If the guest raises the tpr and then signals this with a
> >> succeeding, non vm-exiting instruction to the other vcpus, one of those
> >> could inject an interrupt with a higher priority than the previous tpr,
> >> but a lower one than current tpr. QEMU user space would accept this
> >> interrupt - and would likely surprise the guest. Do I miss something?
> >>
> > Injection happens by vcpu thread on cpu entry:
> > run->request_interrupt_window = kvm_arch_try_push_interrupts(env);
> > and tpr is synced on vcpu exit, so I do not yet see how what you describe
> > above may happen since during injection vcpu should see correct tpr.
> 
> Hmm, maybe this is the key: Once we call into apic_get_interrupt
> (because CPU_INTERRUPT_HARD was set as described above) and we find a
> pending irq below the tpr, we inject a spurious vector instead.
> 
That should be easy to verify. I expect Windows to BSOD upon receiving
spurious vector though.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html