Re: [PATCH] KVM: arm64: arch_timer shouldn't assume the vcpu is loaded

André Przywara <andre.przywara@xxxxxxx> · Wed, 8 Apr 2020 17:50:09 +0100

On 08/04/2020 15:19, Marc Zyngier wrote:

Hi Marc,

> On 2020-04-08 13:13, André Przywara wrote:
>> On 08/04/2020 11:07, Marc Zyngier wrote:
>>
>> Hi Marc,
>>
>>> Hi James,
>>>
>>> Thanks for looking into this.
>>>
>>> On Mon,  6 Apr 2020 16:03:55 +0100
>>> James Morse <james.morse@xxxxxxx> wrote:
>>>
>>>> kvm_arch_timer_get_input_level() needs to get the arch_timer_context
>>>> for
>>>> a particular vcpu, and uses kvm_get_running_vcpu() to find it.
>>>>
>>>> kvm_arch_timer_get_input_level() may be called to handle a user-space
>>>> write to the redistributor, where the vcpu is not loaded. This causes
>>>> kvm_get_running_vcpu() to return NULL:
>>>> | Unable to handle kernel paging request at virtual address
>>>> 0000000000001ec0
>>>> | Mem abort info:
>>>> |   ESR = 0x96000004
>>>> |   EC = 0x25: DABT (current EL), IL = 32 bits
>>>> |   SET = 0, FnV = 0
>>>> |   EA = 0, S1PTW = 0
>>>> | Data abort info:
>>>> |   ISV = 0, ISS = 0x00000004
>>>> |   CM = 0, WnR = 0
>>>> | user pgtable: 4k pages, 48-bit VAs, pgdp=000000003cbf9000
>>>> | [0000000000001ec0] pgd=0000000000000000
>>>> | Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>> | Modules linked in: r8169 realtek efivarfs ip_tables x_tables
>>>> | CPU: 1 PID: 2615 Comm: qemu-system-aar Not tainted 5.6.0-rc7 #30
>>>> | Hardware name: Marvell mvebu_armada-37xx/mvebu_armada-37xx, BIOS
>>>> 2018.03-devel-18.12.3-gc9aa92c-armbian 02/20/2019
>>>> | pstate: 00000085 (nzcv daIf -PAN -UAO)
>>>> | pc : kvm_arch_timer_get_input_level+0x1c/0x68
>>>> | lr : kvm_arch_timer_get_input_level+0x1c/0x68
>>>>
>>>> | Call trace:
>>>> |  kvm_arch_timer_get_input_level+0x1c/0x68
>>>> |  vgic_get_phys_line_level+0x3c/0x90
>>>> |  vgic_mmio_write_senable+0xe4/0x130
>>>> |  vgic_uaccess+0xe0/0x100
>>>> |  vgic_v3_redist_uaccess+0x5c/0x80
>>>> |  vgic_v3_attr_regs_access+0xf0/0x200
>>>> |  nvgic_v3_set_attr+0x234/0x250
>>>> |  kvm_device_ioctl_attr+0xa4/0xf8
>>>> |  kvm_device_ioctl+0x7c/0xc0
>>>> |  ksys_ioctl+0x1fc/0xc18
>>>> |  __arm64_sys_ioctl+0x24/0x30
>>>> |  do_el0_svc+0x7c/0x148
>>>> |  el0_sync_handler+0x138/0x258
>>>> |  el0_sync+0x140/0x180
>>>> | Code: 910003fd f9000bf3 2a0003f3 97ff650c (b95ec001)
>>>> | ---[ end trace 81287612d93f1e70 ]---
>>>> | note: qemu-system-aar[2615] exited with preempt_count 1
>>>>
>>>> Loading the vcpu doesn't make a lot of sense for handling a device
>>>> ioctl(),
>>>> so instead pass the vcpu through to
>>>> kvm_arch_timer_get_input_level(). Its
>>>> not clear that an intid makes much sense without the paired vcpu.
>>>
>>> I don't fully agree with the analysis, Remember we are looking at the
>>> state of the physical interrupt associated with a virtual interrupt, so
>>> the vcpu doesn't quite make sense here if it isn't loaded.
>>
>> But wasn't it that this function is meant to specifically deal with this
>> *without* going to the hardware (which is costly, hence this
>> optimisation)? Because for the timer we *can* work out the logical IRQ
>> line state by examining our saved state? And this is what we do in
>> kvm_timer_should_fire(), when timer_ctx->loaded is false.
> 
> Yes, but that's just a specialization of a more generic interface, which is
> "inspect the state of this *physical* intid". The fact that we are able
> to do
> it in a special way for the timer doesn't change the nature of the
> interface.

> 
>> Which for me this sounds like the right thing to do in this situation:
>> the VCPU (and the timer) is not loaded, so we check our saved state and
>> construct the logical line level. We just need a valid VCPU struct to
>> achieve this, and hope for the virtual timer to be already initialised.
>>
>> Do I miss something here?
> 
> Yes. You are missing that the *interface* is generic, and you can replace
> it with anything you want. Case in point, what we do when get_input_level
> is NULL.
> 
>> Also to me it sound like the interface for this function is slightly
>> lacking, because just an intid is not enough to uniquely identify an
>> IRQ. It was just fine so far because of this special use case.
> 
> This is a *physical* intid. 

Wait, I am confused, the type declaration in struct vgic_irq says:
...
        bool (*get_input_level)(int vintid);
                                   ^^^
Also in vgic.c:vgic_get_phys_line_level() we call
irq->get_input_level(irq->intid), which is the virtual intid.

But I see that the physical intid makes more sense here (in the spirit
of: provide a shortcut for poking the GIC for the associated hwirq), but
shouldn't we then pass at least irq->hwintid (which just happens to be
the same in the arch timer case)?

> It can only mean one single thing, and it
> only makes sense in the context of a vcpu if the device gets
> context-switched.

I see, it's "this PPI on the current CPU, or this SPI/LPI in the
system", and this call is always expected to happen during a context switch.
And then indeed passing a VCPU doesn't make sense.

Thanks for the explanation, I guess we should clarify this in the code
then (because my suggestion was based on the idea that this is was a
virtual IRQ).

Cheers,
Andre

> I can remove the above fast path entirely, and everything will still work
> the same way, without having to pass any vcpu, because the *context* is
> what matters.
> 
> Thanks,
> 
>         M.

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm