Re: kvm-unit-tests: Question about the "no interrupt when timer is disabled" case

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Zenghui,

On 7/25/20 5:50 AM, Zenghui Yu wrote:
> Hi Alex,
>
> [+cc Nianyao]
>
> On 2020/7/24 19:08, Alexandru Elisei wrote:
>> Hi Zenghui,
>>
>> I don't believe this issue can be triggered by a Linux guest. Details below.
>>
>> On 7/23/20 9:56 AM, Zenghui Yu wrote:
>>> Hi Alexandru,
>>>
>>> I've noticed that the timer case will fail in the -stable 4.19 kernel.
>>> The log is as follows:
>>>
>>> FAIL: vtimer-busy-loop: no interrupt when timer is disabled
>>> FAIL: vtimer-busy-loop: interrupt signal no longer pending
>>>
>>> And it's because the related fix [16e604a437c8, "KVM: arm/arm64: vgic:
>>> Reevaluate level sensitive interrupts on enable"] hasn't been backported
>>> to the stable tree.
>>
>> This is not an actual fix (hence no "Fixes" tag), this is more like an improvement
>> of the behaviour of the GIC. Like the patch description says, this can happen even
>> on hardware if the GIC hasn't sampled the device interrupt state (or the device
>> itself hasn't updated it) before the CPU re-enables the interrupt.
>
> Fair enough.
>
>>>
>>> Just out of curiosity, _without_ this fix, had you actually seen the
>>> guest getting into trouble due to an un-retired level-sensitive
>>> interrupt and your patch fixed it? Or this was found by code inspection?
>>
>> This issue was found when running kvm-unit-tests on the model.
>>
>>>
>>> Take the exact vtimer case as an example, is it possible that the Linux
>>> guest would disable the vtimer (the input interrupt line is driven to 0
>>> but the old KVM doesn't take this into account) and potentially hit this
>>> issue? I'm not familiar with it.
>>
>> To trigger this, a guest has to do the following steps:
>>
>> 1. Disable the timer interrupt at the Redistributor level.
>> 2. Trigger the timer interrupt in the timer.
>> 3. Disable the timer entirely (CNT{P,V}_CTL_EL0.ENABLE = 0), which also disables
>> the timer interrupt.
>> 4. Enable the timer interrupt at the Redistributor level.
>>
>> I believe there are two reasons why this will never happen for a Linux guest:
>>
>> - This isn't the way Linux handles interrupts. Furthermore, I don't believe Linux
>> will ever disable a specific interrupt at the irqchip level.
>
> This can at least happen in arch_timer_stop() [arm_arch_timer.c], where
> the disable_percpu_irq() API will disable the interrupt (via irq_mask()
> callback which will in turn disable the interrupt at GIC level by
> programming the ICENABLER0).

Sorry, I missed that. Did a grep for arch_timer_stop(), the function is called
only when a CPU is offlined.

>
> What I'm worried is something like:
>
> 1. Disable the timer interrupt (at RDist level by poking the ICENABLER0,
>    or at CPU level by masking PSTATE.I)

A CPU masking interrupts with PSTATE.I will not trigger the spurious interrupt.
KVM doesn't care if the guest is masking interrupts when it fills the LR
registers, and the interrupt state is updated for level-sensitive hardware
interrupts that are in the LR registers on a guest exit. As long as KVM keeps the
interrupt in a LR register, the interrupt state will be updated at guest exit
(writing to I{C,S}ENABLER0 triggers one).

>
>   [ timer interrupt is made pending, and remains pending in (v)GIC. ]
>
> 2. Disable the timer
> 3. Enable the timer interrupt (at RDist level by poking the ISENABLER0,
>    or at CPU level by unmasking PSTATE.I)
>
>   [ The interrupt is forwarded to (v)CPU, and we end-up re-enabling the
>     timer inside the timer IRQ handler, which may not be as expected. ]
>
> I'm just not sure if this will be a possible scenario in the Linux, and
> is it harmful if this would happen.

I'm not familiar with hotplug, but I assume that a CPU is offlined with a
PSCI_CPU_OFF call. KVM reset the vcpu when it's brought online again, and that
means updating the timer state in kvm_timer_vcpu_load() -> kvm_timer_vcpu_load_gic().

I've been thinking about it, and getting spurious timer interrupts can happen in
another, more common case, when the GIC retires the physical interrupt between
guest exit (when the interrupt state is sampled by KVM) and guest entry. There's
nothing that KVM can do in this case.

>
>> - The timer IRQ handler checks the ISTATUS flag in the timer control register
>> before handling the interrupt. The flag is unset if the timer is disabled.
>
> This actually doesn't match the spec. Arm ARM D13.8.25 has a description
> about the ISTATUS field as below,
>
> "When the value of the ENABLE bit is 0, the ISTATUS field is UNKNOWN."

My bad, when I looked at the Arm ARM I missed that part.

>
> I guess what Nianyao had posted [*] may address the concern above...
>
> [*]
> http://lore.kernel.org/r/1595584037-6877-1-git-send-email-zhangshaokun@xxxxxxxxxxxxx/

I've read Marc's explanation, and I tend to agree. Spurious interrupts can happen
on real hardware too, and there's really nothing that KVM can do about it.

Thanks,

Alex

>
>>
>> I hope my explanation made sense, please chime in if I missed something or you
>> want more details.
>
> Thanks Alex,
> Zenghui
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm




[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux