Re: [PATCH] kvm: arm/arm64: vgic: Fix the sequence principle about vgic save/restore.

wanghaibin <wanghaibin.wang@xxxxxxxxxx> · Tue, 6 Jun 2017 17:29:54 +0800

On 2017/6/6 16:34, Marc Zyngier wrote:

> On 06/06/17 04:23, wanghaibin wrote:
>> On 2017/6/5 21:56, Marc Zyngier wrote:
>>
>>> On 05/06/17 11:30, wanghaibin wrote:
>>>> At present, take GICv3 as as an example, our implementation is that, the operation
>>>> of the recovery ICH_HCR register is prior to the recovery of ICH_LRn registers in vgic
>>>> state restore. Thus, the ICH_LRn registers are 0, and if ICH_HCR.UIE is configured to 1,
>>>> a large number of unnecessary maintenance interrupts will be triggered.
>>>
>>> Is that a theoretical problem? Or something you've actually observed?
>>
>>
>> I observed this problem with that boot a android vm (with 4 vcpus) on my hisilicon D03 board (4 LRs support).
>> Boot the android vm will failed because of any virtual interrupts can't deliver to the vm timely.
>>
>> I watched the maintenance interrupt (/proc/interrupts, GICv3 hwirq:25), and the number can up to 200000+ per second.
>> (sorry for my express fault, the large number of unnecessary maintenance interrupts means this).
> 
> That's really odd, as we disable that interrupt before exiting HYP (you
> should never see that counter increasing). So either your GIC is
> incredibly slow (it fails to retire the interrupt in a timely manner),
> or it is configured as an Edge interrupt (and thus cannot be retired).
> 
> Could you please investigate the last point? Also, do you see warnings
> from the virtual timer (something about unexpected interrupts)?

Yes, just like your said, it's a known defect on hisilicon D03 board,
and the that counter of maintenance interrupt does not increase on hisilicon D05 board.

> 
>>
>>>
>>> At the point where we restore the vgic state, interrupts are disabled.
>>> And by the time we enter the guest, we fully expect the HW to be in a
>>> stable state, and no spurious interrupt would be delivered.
>>
>>
>> At that point where restore the vgic state,  it's true that interrupts are disabled (local_irq_disable),
>> but in my opinion, here maybe a maintenance interrupt pending at physical redist (at that point it can be delivered).
>> and it will be delivered at the moment that current vcpu's ctxt restore and entry (eret, here, PSTATE.I maybe unmask).
>> Thus, the vcpu will be kicked out immediately. At the next vgic state restore point, go round and begin again.
> 
> I understand what happens when the interrupt is observed, but I want to
> understand why it is observed.

The ICH_HCR_EL2.UIE, spec says that:
Underflow Interrupt Enable. Enables the signaling of a maintenance interrupt when the List
registers are empty, or hold only one valid entry.

After the commit (b40c4892) , we clear the ICH_LRn when save the vgic state.
At next vgic restore point, if ICH_HCR_EL2.UIE = 1,  and ICH_LRn is all clear,
I think here will be a maintenance interrupt triggered when ICH_HCR restore early than ICH_LRn restore.

Thanks.

> 
> Thanks,
> 
> 	M.

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm