Re: [PATCH] KVM: arm64: arch_timer shouldn't assume the vcpu is loaded

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/04/2020 09:08, Marc Zyngier wrote:

Hi Marc,

> On Wed, 8 Apr 2020 17:50:09 +0100
> André Przywara <andre.przywara@xxxxxxx> wrote:
> 
>> On 08/04/2020 15:19, Marc Zyngier wrote:
>>
>> Hi Marc,
>>
>>> On 2020-04-08 13:13, André Przywara wrote:  
>>>> On 08/04/2020 11:07, Marc Zyngier wrote:
>>>>
>>>> Hi Marc,
>>>>  
>>>>> Hi James,
>>>>>
>>>>> Thanks for looking into this.
>>>>>
>>>>> On Mon,  6 Apr 2020 16:03:55 +0100
>>>>> James Morse <james.morse@xxxxxxx> wrote:
>>>>>  
>>>>>> kvm_arch_timer_get_input_level() needs to get the arch_timer_context
>>>>>> for
>>>>>> a particular vcpu, and uses kvm_get_running_vcpu() to find it.
>>>>>>
>>>>>> kvm_arch_timer_get_input_level() may be called to handle a user-space
>>>>>> write to the redistributor, where the vcpu is not loaded. This causes
>>>>>> kvm_get_running_vcpu() to return NULL:
>>>>>> | Unable to handle kernel paging request at virtual address
>>>>>> 0000000000001ec0
>>>>>> | Mem abort info:
>>>>>> |   ESR = 0x96000004
>>>>>> |   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>> |   SET = 0, FnV = 0
>>>>>> |   EA = 0, S1PTW = 0
>>>>>> | Data abort info:
>>>>>> |   ISV = 0, ISS = 0x00000004
>>>>>> |   CM = 0, WnR = 0
>>>>>> | user pgtable: 4k pages, 48-bit VAs, pgdp=000000003cbf9000
>>>>>> | [0000000000001ec0] pgd=0000000000000000
>>>>>> | Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>>>>> | Modules linked in: r8169 realtek efivarfs ip_tables x_tables
>>>>>> | CPU: 1 PID: 2615 Comm: qemu-system-aar Not tainted 5.6.0-rc7 #30
>>>>>> | Hardware name: Marvell mvebu_armada-37xx/mvebu_armada-37xx, BIOS
>>>>>> 2018.03-devel-18.12.3-gc9aa92c-armbian 02/20/2019
>>>>>> | pstate: 00000085 (nzcv daIf -PAN -UAO)
>>>>>> | pc : kvm_arch_timer_get_input_level+0x1c/0x68
>>>>>> | lr : kvm_arch_timer_get_input_level+0x1c/0x68
>>>>>>
>>>>>> | Call trace:
>>>>>> |  kvm_arch_timer_get_input_level+0x1c/0x68
>>>>>> |  vgic_get_phys_line_level+0x3c/0x90
>>>>>> |  vgic_mmio_write_senable+0xe4/0x130
>>>>>> |  vgic_uaccess+0xe0/0x100
>>>>>> |  vgic_v3_redist_uaccess+0x5c/0x80
>>>>>> |  vgic_v3_attr_regs_access+0xf0/0x200
>>>>>> |  nvgic_v3_set_attr+0x234/0x250
>>>>>> |  kvm_device_ioctl_attr+0xa4/0xf8
>>>>>> |  kvm_device_ioctl+0x7c/0xc0
>>>>>> |  ksys_ioctl+0x1fc/0xc18
>>>>>> |  __arm64_sys_ioctl+0x24/0x30
>>>>>> |  do_el0_svc+0x7c/0x148
>>>>>> |  el0_sync_handler+0x138/0x258
>>>>>> |  el0_sync+0x140/0x180
>>>>>> | Code: 910003fd f9000bf3 2a0003f3 97ff650c (b95ec001)
>>>>>> | ---[ end trace 81287612d93f1e70 ]---
>>>>>> | note: qemu-system-aar[2615] exited with preempt_count 1
>>>>>>
>>>>>> Loading the vcpu doesn't make a lot of sense for handling a device
>>>>>> ioctl(),
>>>>>> so instead pass the vcpu through to
>>>>>> kvm_arch_timer_get_input_level(). Its
>>>>>> not clear that an intid makes much sense without the paired vcpu.  
>>>>>
>>>>> I don't fully agree with the analysis, Remember we are looking at the
>>>>> state of the physical interrupt associated with a virtual interrupt, so
>>>>> the vcpu doesn't quite make sense here if it isn't loaded.  
>>>>
>>>> But wasn't it that this function is meant to specifically deal with this
>>>> *without* going to the hardware (which is costly, hence this
>>>> optimisation)? Because for the timer we *can* work out the logical IRQ
>>>> line state by examining our saved state? And this is what we do in
>>>> kvm_timer_should_fire(), when timer_ctx->loaded is false.  
>>>
>>> Yes, but that's just a specialization of a more generic interface, which is
>>> "inspect the state of this *physical* intid". The fact that we are able
>>> to do
>>> it in a special way for the timer doesn't change the nature of the
>>> interface.  
>>
>>
>>>   
>>>> Which for me this sounds like the right thing to do in this situation:
>>>> the VCPU (and the timer) is not loaded, so we check our saved state and
>>>> construct the logical line level. We just need a valid VCPU struct to
>>>> achieve this, and hope for the virtual timer to be already initialised.
>>>>
>>>> Do I miss something here?  
>>>
>>> Yes. You are missing that the *interface* is generic, and you can replace
>>> it with anything you want. Case in point, what we do when get_input_level
>>> is NULL.
>>>   
>>>> Also to me it sound like the interface for this function is slightly
>>>> lacking, because just an intid is not enough to uniquely identify an
>>>> IRQ. It was just fine so far because of this special use case.  
>>>
>>> This is a *physical* intid.   
>>
>> Wait, I am confused, the type declaration in struct vgic_irq says:
>> ...
>>         bool (*get_input_level)(int vintid);
>>                                    ^^^
>> Also in vgic.c:vgic_get_phys_line_level() we call
>> irq->get_input_level(irq->intid), which is the virtual intid.
> 
> Yeah, that's not great indeed. It is a cunning shortcut to get to the
> timer, but that really should be the host irq.

>> But I see that the physical intid makes more sense here (in the spirit
>> of: provide a shortcut for poking the GIC for the associated hwirq), but
>> shouldn't we then pass at least irq->hwintid (which just happens to be
>> the same in the arch timer case)?
> 
> hwintid isn't really something we should consider, as it is an
> implementation detail of the physical GIC and list registers. It is
> too low-level to be generally useful. The host_irq field, on the other
> hand, is a better information source, and the timer already has this
> stashed.

That sounds indeed more logical, given that we use that number for the
default handling (irq_get_irqchip_state()).

> Overall, we could just pass the pointer to the vgic_irq, and let the
> helper do whatever it needs to sort it out. After all, it is supposed
> to be faster than going to the GIC, so we can have a bit of leeway here.

That sounds tempting, but didn't we consider struct vgic_irq a data
structure private to the VGIC? I remember we jumped through some hoops
to get rid of internal VGIC data in arch_timer.c, for instance.
A quick grep for vgic_irq returns only hits in the virt/kvm/arm/vgic
directory.

So maybe we should not try to sell this get_input_level() as a generic
solution, but demote it to what it really is: a "cunning shortcut"(TM)
to optimise the arch timer?

What about we change the interface to use the Linux IRQ number, rewrite
the implementation in arch_timer.c to match that against
host_[vp]timer_irq and adjust the comments accordingly?
Then the next user (shall there be one) can extend this interface as needed.

Plus your code to provide user space accessors.

Cheers,
Andre

> Not a big deal, as this isn't the part that is broken ATM.
> 
> Thanks,
> 
> 	M.
> 

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm




[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux