On 08/04/2020 15:19, Marc Zyngier wrote: Hi Marc, > On 2020-04-08 13:13, André Przywara wrote: >> On 08/04/2020 11:07, Marc Zyngier wrote: >> >> Hi Marc, >> >>> Hi James, >>> >>> Thanks for looking into this. >>> >>> On Mon, 6 Apr 2020 16:03:55 +0100 >>> James Morse <james.morse@xxxxxxx> wrote: >>> >>>> kvm_arch_timer_get_input_level() needs to get the arch_timer_context >>>> for >>>> a particular vcpu, and uses kvm_get_running_vcpu() to find it. >>>> >>>> kvm_arch_timer_get_input_level() may be called to handle a user-space >>>> write to the redistributor, where the vcpu is not loaded. This causes >>>> kvm_get_running_vcpu() to return NULL: >>>> | Unable to handle kernel paging request at virtual address >>>> 0000000000001ec0 >>>> | Mem abort info: >>>> | ESR = 0x96000004 >>>> | EC = 0x25: DABT (current EL), IL = 32 bits >>>> | SET = 0, FnV = 0 >>>> | EA = 0, S1PTW = 0 >>>> | Data abort info: >>>> | ISV = 0, ISS = 0x00000004 >>>> | CM = 0, WnR = 0 >>>> | user pgtable: 4k pages, 48-bit VAs, pgdp=000000003cbf9000 >>>> | [0000000000001ec0] pgd=0000000000000000 >>>> | Internal error: Oops: 96000004 [#1] PREEMPT SMP >>>> | Modules linked in: r8169 realtek efivarfs ip_tables x_tables >>>> | CPU: 1 PID: 2615 Comm: qemu-system-aar Not tainted 5.6.0-rc7 #30 >>>> | Hardware name: Marvell mvebu_armada-37xx/mvebu_armada-37xx, BIOS >>>> 2018.03-devel-18.12.3-gc9aa92c-armbian 02/20/2019 >>>> | pstate: 00000085 (nzcv daIf -PAN -UAO) >>>> | pc : kvm_arch_timer_get_input_level+0x1c/0x68 >>>> | lr : kvm_arch_timer_get_input_level+0x1c/0x68 >>>> >>>> | Call trace: >>>> | kvm_arch_timer_get_input_level+0x1c/0x68 >>>> | vgic_get_phys_line_level+0x3c/0x90 >>>> | vgic_mmio_write_senable+0xe4/0x130 >>>> | vgic_uaccess+0xe0/0x100 >>>> | vgic_v3_redist_uaccess+0x5c/0x80 >>>> | vgic_v3_attr_regs_access+0xf0/0x200 >>>> | nvgic_v3_set_attr+0x234/0x250 >>>> | kvm_device_ioctl_attr+0xa4/0xf8 >>>> | kvm_device_ioctl+0x7c/0xc0 >>>> | ksys_ioctl+0x1fc/0xc18 >>>> | __arm64_sys_ioctl+0x24/0x30 >>>> | do_el0_svc+0x7c/0x148 >>>> | el0_sync_handler+0x138/0x258 >>>> | el0_sync+0x140/0x180 >>>> | Code: 910003fd f9000bf3 2a0003f3 97ff650c (b95ec001) >>>> | ---[ end trace 81287612d93f1e70 ]--- >>>> | note: qemu-system-aar[2615] exited with preempt_count 1 >>>> >>>> Loading the vcpu doesn't make a lot of sense for handling a device >>>> ioctl(), >>>> so instead pass the vcpu through to >>>> kvm_arch_timer_get_input_level(). Its >>>> not clear that an intid makes much sense without the paired vcpu. >>> >>> I don't fully agree with the analysis, Remember we are looking at the >>> state of the physical interrupt associated with a virtual interrupt, so >>> the vcpu doesn't quite make sense here if it isn't loaded. >> >> But wasn't it that this function is meant to specifically deal with this >> *without* going to the hardware (which is costly, hence this >> optimisation)? Because for the timer we *can* work out the logical IRQ >> line state by examining our saved state? And this is what we do in >> kvm_timer_should_fire(), when timer_ctx->loaded is false. > > Yes, but that's just a specialization of a more generic interface, which is > "inspect the state of this *physical* intid". The fact that we are able > to do > it in a special way for the timer doesn't change the nature of the > interface. > >> Which for me this sounds like the right thing to do in this situation: >> the VCPU (and the timer) is not loaded, so we check our saved state and >> construct the logical line level. We just need a valid VCPU struct to >> achieve this, and hope for the virtual timer to be already initialised. >> >> Do I miss something here? > > Yes. You are missing that the *interface* is generic, and you can replace > it with anything you want. Case in point, what we do when get_input_level > is NULL. > >> Also to me it sound like the interface for this function is slightly >> lacking, because just an intid is not enough to uniquely identify an >> IRQ. It was just fine so far because of this special use case. > > This is a *physical* intid. Wait, I am confused, the type declaration in struct vgic_irq says: ... bool (*get_input_level)(int vintid); ^^^ Also in vgic.c:vgic_get_phys_line_level() we call irq->get_input_level(irq->intid), which is the virtual intid. But I see that the physical intid makes more sense here (in the spirit of: provide a shortcut for poking the GIC for the associated hwirq), but shouldn't we then pass at least irq->hwintid (which just happens to be the same in the arch timer case)? > It can only mean one single thing, and it > only makes sense in the context of a vcpu if the device gets > context-switched. I see, it's "this PPI on the current CPU, or this SPI/LPI in the system", and this call is always expected to happen during a context switch. And then indeed passing a VCPU doesn't make sense. Thanks for the explanation, I guess we should clarify this in the code then (because my suggestion was based on the idea that this is was a virtual IRQ). Cheers, Andre > I can remove the above fast path entirely, and everything will still work > the same way, without having to pass any vcpu, because the *context* is > what matters. > > Thanks, > > M. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm