On 09/04/2020 09:08, Marc Zyngier wrote: Hi Marc, > On Wed, 8 Apr 2020 17:50:09 +0100 > André Przywara <andre.przywara@xxxxxxx> wrote: > >> On 08/04/2020 15:19, Marc Zyngier wrote: >> >> Hi Marc, >> >>> On 2020-04-08 13:13, André Przywara wrote: >>>> On 08/04/2020 11:07, Marc Zyngier wrote: >>>> >>>> Hi Marc, >>>> >>>>> Hi James, >>>>> >>>>> Thanks for looking into this. >>>>> >>>>> On Mon, 6 Apr 2020 16:03:55 +0100 >>>>> James Morse <james.morse@xxxxxxx> wrote: >>>>> >>>>>> kvm_arch_timer_get_input_level() needs to get the arch_timer_context >>>>>> for >>>>>> a particular vcpu, and uses kvm_get_running_vcpu() to find it. >>>>>> >>>>>> kvm_arch_timer_get_input_level() may be called to handle a user-space >>>>>> write to the redistributor, where the vcpu is not loaded. This causes >>>>>> kvm_get_running_vcpu() to return NULL: >>>>>> | Unable to handle kernel paging request at virtual address >>>>>> 0000000000001ec0 >>>>>> | Mem abort info: >>>>>> | ESR = 0x96000004 >>>>>> | EC = 0x25: DABT (current EL), IL = 32 bits >>>>>> | SET = 0, FnV = 0 >>>>>> | EA = 0, S1PTW = 0 >>>>>> | Data abort info: >>>>>> | ISV = 0, ISS = 0x00000004 >>>>>> | CM = 0, WnR = 0 >>>>>> | user pgtable: 4k pages, 48-bit VAs, pgdp=000000003cbf9000 >>>>>> | [0000000000001ec0] pgd=0000000000000000 >>>>>> | Internal error: Oops: 96000004 [#1] PREEMPT SMP >>>>>> | Modules linked in: r8169 realtek efivarfs ip_tables x_tables >>>>>> | CPU: 1 PID: 2615 Comm: qemu-system-aar Not tainted 5.6.0-rc7 #30 >>>>>> | Hardware name: Marvell mvebu_armada-37xx/mvebu_armada-37xx, BIOS >>>>>> 2018.03-devel-18.12.3-gc9aa92c-armbian 02/20/2019 >>>>>> | pstate: 00000085 (nzcv daIf -PAN -UAO) >>>>>> | pc : kvm_arch_timer_get_input_level+0x1c/0x68 >>>>>> | lr : kvm_arch_timer_get_input_level+0x1c/0x68 >>>>>> >>>>>> | Call trace: >>>>>> | kvm_arch_timer_get_input_level+0x1c/0x68 >>>>>> | vgic_get_phys_line_level+0x3c/0x90 >>>>>> | vgic_mmio_write_senable+0xe4/0x130 >>>>>> | vgic_uaccess+0xe0/0x100 >>>>>> | vgic_v3_redist_uaccess+0x5c/0x80 >>>>>> | vgic_v3_attr_regs_access+0xf0/0x200 >>>>>> | nvgic_v3_set_attr+0x234/0x250 >>>>>> | kvm_device_ioctl_attr+0xa4/0xf8 >>>>>> | kvm_device_ioctl+0x7c/0xc0 >>>>>> | ksys_ioctl+0x1fc/0xc18 >>>>>> | __arm64_sys_ioctl+0x24/0x30 >>>>>> | do_el0_svc+0x7c/0x148 >>>>>> | el0_sync_handler+0x138/0x258 >>>>>> | el0_sync+0x140/0x180 >>>>>> | Code: 910003fd f9000bf3 2a0003f3 97ff650c (b95ec001) >>>>>> | ---[ end trace 81287612d93f1e70 ]--- >>>>>> | note: qemu-system-aar[2615] exited with preempt_count 1 >>>>>> >>>>>> Loading the vcpu doesn't make a lot of sense for handling a device >>>>>> ioctl(), >>>>>> so instead pass the vcpu through to >>>>>> kvm_arch_timer_get_input_level(). Its >>>>>> not clear that an intid makes much sense without the paired vcpu. >>>>> >>>>> I don't fully agree with the analysis, Remember we are looking at the >>>>> state of the physical interrupt associated with a virtual interrupt, so >>>>> the vcpu doesn't quite make sense here if it isn't loaded. >>>> >>>> But wasn't it that this function is meant to specifically deal with this >>>> *without* going to the hardware (which is costly, hence this >>>> optimisation)? Because for the timer we *can* work out the logical IRQ >>>> line state by examining our saved state? And this is what we do in >>>> kvm_timer_should_fire(), when timer_ctx->loaded is false. >>> >>> Yes, but that's just a specialization of a more generic interface, which is >>> "inspect the state of this *physical* intid". The fact that we are able >>> to do >>> it in a special way for the timer doesn't change the nature of the >>> interface. >> >> >>> >>>> Which for me this sounds like the right thing to do in this situation: >>>> the VCPU (and the timer) is not loaded, so we check our saved state and >>>> construct the logical line level. We just need a valid VCPU struct to >>>> achieve this, and hope for the virtual timer to be already initialised. >>>> >>>> Do I miss something here? >>> >>> Yes. You are missing that the *interface* is generic, and you can replace >>> it with anything you want. Case in point, what we do when get_input_level >>> is NULL. >>> >>>> Also to me it sound like the interface for this function is slightly >>>> lacking, because just an intid is not enough to uniquely identify an >>>> IRQ. It was just fine so far because of this special use case. >>> >>> This is a *physical* intid. >> >> Wait, I am confused, the type declaration in struct vgic_irq says: >> ... >> bool (*get_input_level)(int vintid); >> ^^^ >> Also in vgic.c:vgic_get_phys_line_level() we call >> irq->get_input_level(irq->intid), which is the virtual intid. > > Yeah, that's not great indeed. It is a cunning shortcut to get to the > timer, but that really should be the host irq. >> But I see that the physical intid makes more sense here (in the spirit >> of: provide a shortcut for poking the GIC for the associated hwirq), but >> shouldn't we then pass at least irq->hwintid (which just happens to be >> the same in the arch timer case)? > > hwintid isn't really something we should consider, as it is an > implementation detail of the physical GIC and list registers. It is > too low-level to be generally useful. The host_irq field, on the other > hand, is a better information source, and the timer already has this > stashed. That sounds indeed more logical, given that we use that number for the default handling (irq_get_irqchip_state()). > Overall, we could just pass the pointer to the vgic_irq, and let the > helper do whatever it needs to sort it out. After all, it is supposed > to be faster than going to the GIC, so we can have a bit of leeway here. That sounds tempting, but didn't we consider struct vgic_irq a data structure private to the VGIC? I remember we jumped through some hoops to get rid of internal VGIC data in arch_timer.c, for instance. A quick grep for vgic_irq returns only hits in the virt/kvm/arm/vgic directory. So maybe we should not try to sell this get_input_level() as a generic solution, but demote it to what it really is: a "cunning shortcut"(TM) to optimise the arch timer? What about we change the interface to use the Linux IRQ number, rewrite the implementation in arch_timer.c to match that against host_[vp]timer_irq and adjust the comments accordingly? Then the next user (shall there be one) can extend this interface as needed. Plus your code to provide user space accessors. Cheers, Andre > Not a big deal, as this isn't the part that is broken ATM. > > Thanks, > > M. > _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm