On Wed, Feb 02, 2011 at 05:36:53PM +0100, Jan Kiszka wrote: > On 2011-02-02 17:29, Gleb Natapov wrote: > > On Wed, Feb 02, 2011 at 04:52:11PM +0100, Jan Kiszka wrote: > >> On 2011-02-02 16:46, Gleb Natapov wrote: > >>> On Wed, Feb 02, 2011 at 04:35:25PM +0100, Jan Kiszka wrote: > >>>> On 2011-02-02 16:09, Avi Kivity wrote: > >>>>> On 02/02/2011 04:52 PM, Jan Kiszka wrote: > >>>>>> On 2011-02-02 15:43, Jan Kiszka wrote: > >>>>>>> On 2011-02-02 15:35, Avi Kivity wrote: > >>>>>>>> On 02/02/2011 04:30 PM, Jan Kiszka wrote: > >>>>>>>>> On 2011-02-02 14:05, Avi Kivity wrote: > >>>>>>>>>> On 02/02/2011 02:50 PM, Jan Kiszka wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>> Opps, -smp 1. With -smp 2 it boot almost completely and then hangs. > >>>>>>>>>>> > >>>>>>>>>>> Ah, good (or not good). With Windows 2003 Server, I actually get a Blue > >>>>>>>>>>> Screen (Stop 0x000000b8). > >>>>>>>>>> > >>>>>>>>>> Userspace APIC is broken since it may run with an outdated cr8, does > >>>>>>>>>> reverting 27a4f7976d5 help? > >>>>>>>>> > >>>>>>>>> Can you elaborate on what is broken? The way hw/apic.c maintains the > >>>>>>>>> tpr? Would it make sense to compare this against the in-kernel model? Or > >>>>>>>>> do you mean something else? > >>>>>>>> > >>>>>>>> The problem, IIRC, was that we look up the TPR but it may already have > >>>>>>>> been changed by the running vcpu. Not 100% sure. > >>>>>>>> > >>>>>>>> If that is indeed the problem then the fix would be to process the APIC > >>>>>>>> in vcpu context (which is what the kernel does - we set a bit in the IRR > >>>>>>>> and all further processing is synchronous). > >>>>>>> > >>>>>>> You mean: user space changes the tpr value while the vcpu is in KVM_RUN, > >>>>>>> then we return from the kernel and overwrite the tpr in the apic with > >>>>>>> the vcpu's view, right? > >>>>>> > >>>>>> Hmm, probably rather that there is a discrepancy between tpr and irr. > >>>>>> The latter is changed asynchronously /wrt to the vcpu, the former /wrt > >>>>>> the user space device model. > >>>>> > >>>>> And yet, both are synchronized via qemu_mutex. So we're still missing > >>>>> something in this picture. > >>>>> > >>>>>> Run apic_set_irq on the vcpu? > >>>>> > >>>>> static void apic_set_irq(APICState *s, int vector_num, int trigger_mode) > >>>>> { > >>>>> apic_irq_delivered += !get_bit(s->irr, vector_num); > >>>>> > >>>>> trace_apic_set_irq(apic_irq_delivered); > >>>>> > >>>>> set_bit(s->irr, vector_num); > >>>>> > >>>>> This is even more async with kernel irqchip > >>>>> > >>>>> if (trigger_mode) > >>>>> set_bit(s->tmr, vector_num); > >>>>> else > >>>>> reset_bit(s->tmr, vector_num); > >>>>> > >>>>> This is protected by qemu_mutex > >>>>> > >>>>> apic_update_irq(s); > >>>>> > >>>>> This will be run the next time the vcpu exits, via apic_get_interrupt(). > >>>> > >>>> The decision to pend an IRQ (and potentially kick the vcpu) takes place > >>>> immediately in acip_update_irq. And it is based on current irr as well > >>>> as tpr. But we update again when user space returns with a new value. > >>>> > >>>>> > >>>>> } > >>>>> > >>>>> Did you check whether reverting that commit helps? > >>>>> > >>>> > >>>> Just did so, and I can no longer reproduce the problem. Hmm... > >>>> > >>> If there is no problem in the logic of this commit (and I do not see > >>> one yet) then we somewhere miss kicking vcpu when interrupt, that should be > >>> handled, arrives? > >> > >> I'm not yet confident about the logic of the kernel patch: mov to cr8 is > >> serializing. If the guest raises the tpr and then signals this with a > >> succeeding, non vm-exiting instruction to the other vcpus, one of those > >> could inject an interrupt with a higher priority than the previous tpr, > >> but a lower one than current tpr. QEMU user space would accept this > >> interrupt - and would likely surprise the guest. Do I miss something? > >> > > Injection happens by vcpu thread on cpu entry: > > run->request_interrupt_window = kvm_arch_try_push_interrupts(env); > > and tpr is synced on vcpu exit, so I do not yet see how what you describe > > above may happen since during injection vcpu should see correct tpr. > > Hmm, maybe this is the key: Once we call into apic_get_interrupt > (because CPU_INTERRUPT_HARD was set as described above) and we find a > pending irq below the tpr, we inject a spurious vector instead. > That should be easy to verify. I expect Windows to BSOD upon receiving spurious vector though. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html