Re: [Qemu-devel] KVM: Windows 64-bit troubles with user space irqchip

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Wed, 02 Feb 2011 17:51:32 +0100

On 2011-02-02 17:39, Gleb Natapov wrote:
> On Wed, Feb 02, 2011 at 05:36:53PM +0100, Jan Kiszka wrote:
>> On 2011-02-02 17:29, Gleb Natapov wrote:
>>> On Wed, Feb 02, 2011 at 04:52:11PM +0100, Jan Kiszka wrote:
>>>> On 2011-02-02 16:46, Gleb Natapov wrote:
>>>>> On Wed, Feb 02, 2011 at 04:35:25PM +0100, Jan Kiszka wrote:
>>>>>> On 2011-02-02 16:09, Avi Kivity wrote:
>>>>>>> On 02/02/2011 04:52 PM, Jan Kiszka wrote:
>>>>>>>> On 2011-02-02 15:43, Jan Kiszka wrote:
>>>>>>>>>  On 2011-02-02 15:35, Avi Kivity wrote:
>>>>>>>>>>  On 02/02/2011 04:30 PM, Jan Kiszka wrote:
>>>>>>>>>>>  On 2011-02-02 14:05, Avi Kivity wrote:
>>>>>>>>>>>>   On 02/02/2011 02:50 PM, Jan Kiszka wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Opps, -smp 1. With -smp 2 it boot almost completely and then hangs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>   Ah, good (or not good). With Windows 2003 Server, I actually get a Blue
>>>>>>>>>>>>>   Screen (Stop 0x000000b8).
>>>>>>>>>>>>
>>>>>>>>>>>>   Userspace APIC is broken since it may run with an outdated cr8, does
>>>>>>>>>>>>   reverting 27a4f7976d5 help?
>>>>>>>>>>>
>>>>>>>>>>>  Can you elaborate on what is broken? The way hw/apic.c maintains the
>>>>>>>>>>>  tpr? Would it make sense to compare this against the in-kernel model? Or
>>>>>>>>>>>  do you mean something else?
>>>>>>>>>>
>>>>>>>>>>  The problem, IIRC, was that we look up the TPR but it may already have
>>>>>>>>>>  been changed by the running vcpu.  Not 100% sure.
>>>>>>>>>>
>>>>>>>>>>  If that is indeed the problem then the fix would be to process the APIC
>>>>>>>>>>  in vcpu context (which is what the kernel does - we set a bit in the IRR
>>>>>>>>>>  and all further processing is synchronous).
>>>>>>>>>
>>>>>>>>>  You mean: user space changes the tpr value while the vcpu is in KVM_RUN,
>>>>>>>>>  then we return from the kernel and overwrite the tpr in the apic with
>>>>>>>>>  the vcpu's view, right?
>>>>>>>>
>>>>>>>> Hmm, probably rather that there is a discrepancy between tpr and irr.
>>>>>>>> The latter is changed asynchronously /wrt to the vcpu, the former /wrt
>>>>>>>> the user space device model.
>>>>>>>
>>>>>>> And yet, both are synchronized via qemu_mutex.  So we're still missing 
>>>>>>> something in this picture.
>>>>>>>
>>>>>>>> Run apic_set_irq on the vcpu?
>>>>>>>
>>>>>>> static void apic_set_irq(APICState *s, int vector_num, int trigger_mode)
>>>>>>> {
>>>>>>>      apic_irq_delivered += !get_bit(s->irr, vector_num);
>>>>>>>
>>>>>>>      trace_apic_set_irq(apic_irq_delivered);
>>>>>>>
>>>>>>>      set_bit(s->irr, vector_num);
>>>>>>>
>>>>>>> This is even more async with kernel irqchip
>>>>>>>
>>>>>>>      if (trigger_mode)
>>>>>>>          set_bit(s->tmr, vector_num);
>>>>>>>      else
>>>>>>>          reset_bit(s->tmr, vector_num);
>>>>>>>
>>>>>>> This is protected by qemu_mutex
>>>>>>>
>>>>>>>      apic_update_irq(s);
>>>>>>>
>>>>>>> This will be run the next time the vcpu exits, via apic_get_interrupt().
>>>>>>
>>>>>> The decision to pend an IRQ (and potentially kick the vcpu) takes place
>>>>>> immediately in acip_update_irq. And it is based on current irr as well
>>>>>> as tpr. But we update again when user space returns with a new value.
>>>>>>
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> Did you check whether reverting that commit helps?
>>>>>>>
>>>>>>
>>>>>> Just did so, and I can no longer reproduce the problem. Hmm...
>>>>>>
>>>>> If there is no problem in the logic of this commit (and I do not see
>>>>> one yet) then we somewhere miss kicking vcpu when interrupt, that should be
>>>>> handled, arrives?
>>>>
>>>> I'm not yet confident about the logic of the kernel patch: mov to cr8 is
>>>> serializing. If the guest raises the tpr and then signals this with a
>>>> succeeding, non vm-exiting instruction to the other vcpus, one of those
>>>> could inject an interrupt with a higher priority than the previous tpr,
>>>> but a lower one than current tpr. QEMU user space would accept this
>>>> interrupt - and would likely surprise the guest. Do I miss something?
>>>>
>>> Injection happens by vcpu thread on cpu entry:
>>> run->request_interrupt_window = kvm_arch_try_push_interrupts(env);
>>> and tpr is synced on vcpu exit, so I do not yet see how what you describe
>>> above may happen since during injection vcpu should see correct tpr.
>>
>> Hmm, maybe this is the key: Once we call into apic_get_interrupt
>> (because CPU_INTERRUPT_HARD was set as described above) and we find a
>> pending irq below the tpr, we inject a spurious vector instead.
>>
> That should be easy to verify. I expect Windows to BSOD upon receiving
> spurious vector though.

I hacked spurious irq injection away, but the issue remains. At the same
time, Windows is receiving tons of spurious interrupts without any
complaints, even without that tpr optimization in the kernel. So this is
obviously not yet the key.

Let's try your idea that we miss a wakeup.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html