Re: [PATCH] KVM: arm/arm64: Fix arch timers with userspace irqchips

Marc Zyngier <marc.zyngier@xxxxxxx> · Wed, 31 Jan 2018 11:31:55 +0000



On 31/01/18 11:23, Christoffer Dall wrote:
> On Wed, Jan 31, 2018 at 12:19 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>> On 31/01/18 10:05, Christoffer Dall wrote:
>>> On Wed, Jan 31, 2018 at 09:37:31AM +0000, Marc Zyngier wrote:
>>>> On 30/01/18 12:46, Christoffer Dall wrote:
>>>>> When introducing support for irqchip in userspace we needed a way to
>>>>> mask the timer signal to prevent the guest continuously exiting due to a
>>>>> screaming timer.
>>>>>
>>>>> We did this by disabling the corresponding percpu interrupt on the
>>>>> host interrupt controller, because we cannot rely on the host system
>>>>> having a GIC, and therefore cannot make any assumptions about having an
>>>>> active state to hide the timer signal.
>>>>>
>>>>> Unfortunately, when introducing this feature, it became entirely
>>>>> possible that a VCPU which belongs to a VM that has a userspace irqchip
>>>>> can disable the vtimer irq on the host on some physical CPU, and then go
>>>>> away without ever enabling the vimter irq on that physical CPU again.
>>>>
>>>> vtimer
>>>>
>>>>>
>>>>> This means that using irqchips in userspace on a system that also
>>>>> supports running VMs with an in-kernel GIC can prevent forward progress
>>>>> from in-kernel GIC VMs.
>>>>>
>>>>> Later on, when we started taking virtual timer interrupts in the arch
>>>>> timer code, we would also leave this timer state active for userspace
>>>>> irqchip VMs, because we leave it up to a VGIC-enabled guest to
>>>>> deactivate the hardware IRQ using the HW bit in the LR.
>>>>>
>>>>> Both issues are solved by only using the enable/disable trick on systems
>>>>> that do not have a host GIC which supports the active state, because all
>>>>> VMs on such systems must use irqchips in userspace.  Systems that have a
>>>>> working GIC with support for an active state use the active state to
>>>>> mask the timer signal for both userspace an in-kernel irqchips.
>>>
>>> this should also be "and"
>>>
>>>>>
>>>>> Cc: Alexander Graf <agraf@xxxxxxx>
>>>>> Cc: <stable@xxxxxxxxxxxxxxx> # v4.12+
>>>>> Fixes: d9e139778376 ("KVM: arm/arm64: Support arch timers with a userspace gic")
>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx>
>>>>> ---
>>>>> This conflicts horribly with everything when applied to either
>>>>> kvmarm/queue or kvmarm/master.  Therefore, this patch is written for
>>>>> (and applies to) v4.15 with kvmarm/queue merged and should therefore
>>>>> apply cleanly after v4.16-rc1.  An example with this patch applied can
>>>>> be found on kvmarm/temp-for-v4.16-rc2.  I plan on sending this along
>>>>> with any other potential fixes post v4.16-rc1.
>>>>>
>>>>>  virt/kvm/arm/arch_timer.c | 77 ++++++++++++++++++++++++++---------------------
>>>>>  1 file changed, 42 insertions(+), 35 deletions(-)
>>>>>
>>>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>>>> index 70268c0bec79..228906ceb722 100644
>>>>> --- a/virt/kvm/arm/arch_timer.c
>>>>> +++ b/virt/kvm/arm/arch_timer.c
>>>>> @@ -35,6 +35,7 @@
>>>>>  static struct timecounter *timecounter;
>>>>>  static unsigned int host_vtimer_irq;
>>>>>  static u32 host_vtimer_irq_flags;
>>>>> +static bool has_gic_active_state;
>>>>>
>>>>>  static const struct kvm_irq_level default_ptimer_irq = {
>>>>>     .irq    = 30,
>>>>> @@ -69,25 +70,6 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
>>>>>             cancel_work_sync(work);
>>>>>  }
>>>>>
>>>>> -static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)
>>>>> -{
>>>>> -   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>>>>> -
>>>>> -   /*
>>>>> -    * When using a userspace irqchip with the architected timers, we must
>>>>> -    * prevent continuously exiting from the guest, and therefore mask the
>>>>> -    * physical interrupt by disabling it on the host interrupt controller
>>>>> -    * when the virtual level is high, such that the guest can make
>>>>> -    * forward progress.  Once we detect the output level being
>>>>> -    * de-asserted, we unmask the interrupt again so that we exit from the
>>>>> -    * guest when the timer fires.
>>>>> -    */
>>>>> -   if (vtimer->irq.level)
>>>>> -           disable_percpu_irq(host_vtimer_irq);
>>>>> -   else
>>>>> -           enable_percpu_irq(host_vtimer_irq, 0);
>>>>> -}
>>>>> -
>>>>>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>>>>>  {
>>>>>     struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
>>>>> @@ -107,8 +89,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>>>>>             kvm_timer_update_irq(vcpu, true, vtimer);
>>>>>
>>>>>     if (static_branch_unlikely(&userspace_irqchip_in_use) &&
>>>>> -       unlikely(!irqchip_in_kernel(vcpu->kvm)))
>>>>> -           kvm_vtimer_update_mask_user(vcpu);
>>>>> +       unlikely(!irqchip_in_kernel(vcpu->kvm)) && !has_gic_active_state)
>>>>> +           disable_percpu_irq(host_vtimer_irq);
>>>>
>>>> nit: this is the negated version of the fix you posted earlier
>>>> (0e23cb34af26 in kvmarm/queue), plus a new term... How about rewriting
>>>> this as:
>>>>
>>>> static inline bool userspace_irqchip(struct kvm *kvm)
>>>> {
>>>>      return (static_branch_unlikely(&userspace_irqchip_in_use) &&
>>>>              unlikely(!irqchip_in_kernel(kvm)));
>>>> }
>>>>
>>>> and this becomes:
>>>>
>>>>      if (userspace_irqchip(vcpu->kvm) && !has_gic_active_state)
>>>>              [...]
>>>>
>>>> which I find much more readable.
>>>>
>>>> Same for kvm_timer_update_irq():
>>>>
>>>>      if (!userspace_irqchip(vcpu->kvm))
>>>>              [...]
>>>>
>>>
>>> yes, good idea, I find it more readable too.
>>>
>>>>>
>>>>>     return IRQ_HANDLED;
>>>>>  }
>>>>> @@ -460,13 +442,16 @@ static void set_cntvoff(u64 cntvoff)
>>>>>     kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
>>>>>  }
>>>>>
>>>>> -static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
>>>>> +static void kvm_timer_vcpu_load_gic(struct kvm_vcpu *vcpu)
>>>>>  {
>>>>>     struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>>>>>     bool phys_active;
>>>>>     int ret;
>>>>>
>>>>> -   phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
>>>>> +   if (irqchip_in_kernel(vcpu->kvm))
>>>>> +           phys_active = kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
>>>>> +   else
>>>>> +           phys_active = vtimer->irq.level;
>>>>
>>>> You could use the above helper here too.
>>>>
>>>
>>> yes
>>>
>>>>>
>>>>>     ret = irq_set_irqchip_state(host_vtimer_irq,
>>>>>                                 IRQCHIP_STATE_ACTIVE,
>>>>> @@ -474,9 +459,24 @@ static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
>>>>>     WARN_ON(ret);
>>>>>  }
>>>>>
>>>>> -static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)
>>>>> +static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
>>>>>  {
>>>>> -   kvm_vtimer_update_mask_user(vcpu);
>>>>> +   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>>>>> +
>>>>> +   /*
>>>>> +    * When using a userspace irqchip with the architected timers and a
>>>>> +    * host interrupt controller that doesn't support an active state, we
>>>>> +    * must still we must prevent continuously exiting from the guest, and
>>>
>>> and here I should s/we must//
>>>
>>>>> +    * therefore mask the physical interrupt by disabling it on the host
>>>>> +    * interrupt controller when the virtual level is high, such that the
>>>>> +    * guest can make forward progress.  Once we detect the output level
>>>>> +    * being de-asserted, we unmask the interrupt again so that we exit
>>>>> +    * from the guest when the timer fires.
>>>>> +    */
>>>>> +   if (vtimer->irq.level)
>>>>> +           disable_percpu_irq(host_vtimer_irq);
>>>>> +   else
>>>>> +           enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
>>>>>  }
>>>>>
>>>>>  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>>>>> @@ -487,10 +487,10 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>>>>>     if (unlikely(!timer->enabled))
>>>>>             return;
>>>>>
>>>>> -   if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
>>>>> -           kvm_timer_vcpu_load_user(vcpu);
>>>>> +   if (has_gic_active_state)
>>>>
>>>> likely()?
>>>>
>>>
>>> yes, will be fixes if using a static key.
>>>
>>>>> +           kvm_timer_vcpu_load_gic(vcpu);
>>>>>     else
>>>>> -           kvm_timer_vcpu_load_vgic(vcpu);
>>>>> +           kvm_timer_vcpu_load_nogic(vcpu);
>>>>>
>>>>>     set_cntvoff(vtimer->cntvoff);
>>>>>
>>>>> @@ -555,18 +555,23 @@ static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)
>>>>>  {
>>>>>     struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>>>>>
>>>>> -   if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
>>>>> -           __timer_snapshot_state(vtimer);
>>>>> -           if (!kvm_timer_should_fire(vtimer)) {
>>>>> -                   kvm_timer_update_irq(vcpu, false, vtimer);
>>>>> -                   kvm_vtimer_update_mask_user(vcpu);
>>>>> -           }
>>>>> +   __timer_snapshot_state(vtimer);
>>>>> +   if (!kvm_timer_should_fire(vtimer)) {
>>>>> +           kvm_timer_update_irq(vcpu, false, vtimer);
>>>>> +           if (!has_gic_active_state)
>>>>
>>>> unlikely()?
>>>>
>>>
>>> same
>>>
>>>>> +                   enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
>>>
>>> So, reading your reply made me think.  Shouldn't this actually be:
>>>
>>>               if (static_key_likely(has_gic_active_state))
>>>                       enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
>>>               else
>>>                       irq_set_irqchip_state(host_vtimer_irq, IRQCHIP_STATE_ACTIVE, false);
>>
>> Shouldn't it be the other way around? Use the active state when we have
>> a GIC and mask if we don't?
>>
> 
> Yes, it should.  Doh.
> 
>>>
>>> ... which makes me want to wrap the irqchip call in a small wrapper
>>> called from here and from load().  What do you think?
>>
>> Indeed. Some operation that "does the right thing", no matter what what
>> bizarre configuration we're in. It would definitely help understanding
>> the flow which has become a bit unwieldy.
>>
> 
> I tried reworking the load function to first branch off
> userspace_irqchip() and then call the sub-function from here, but it
> was no more readable, so I've done something in between for v2.  Stay
> tuned...
> 
> Did I mention that I hate this feature, which keeps breaking, and
> which really isn't covered by a simple kvm-unit-test script?

+1. I also contend that nobody is using it. Otherwise the breakages
would have been noticed on every kernel release...

	M.
-- 
Jazz is not dead. It just smells funny...