Hi Eric, On Sun, 09 May 2021 18:00:04 +0100, Auger Eric <eric.auger@xxxxxxxxxx> wrote: > > Hi, > On 5/7/21 1:02 PM, Marc Zyngier wrote: > > On Fri, 07 May 2021 10:58:23 +0100, > > Shaokun Zhang <zhangshaokun@xxxxxxxxxxxxx> wrote: > >> > >> Hi Marc, > >> > >> Thanks for your quick reply. > >> > >> On 2021/5/7 17:03, Marc Zyngier wrote: > >>> On Fri, 07 May 2021 06:57:04 +0100, > >>> Shaokun Zhang <zhangshaokun@xxxxxxxxxxxxx> wrote: > >>>> > >>>> [This letter comes from Nianyao Tang] > >>>> > >>>> Hi, > >>>> > >>>> Using GICv4/4.1 and msi capability, guest vf driver requires 3 > >>>> vectors and enable msi, will lead to guest stuck. > >>> > >>> Stuck how? > >> > >> Guest serial does not response anymore and guest network shutdown. > >> > >>> > >>>> Qemu gets number of interrupts from Multiple Message Capable field > >>>> set by guest. This field is aligned to a power of 2(if a function > >>>> requires 3 vectors, it initializes it to 2). > >>> > >>> So I guess this is a MultiMSI device with 4 vectors, right? > >>> > >> > >> Yes, it can support maximum of 32 msi interrupts, and vf driver only use 3 msi. > >> > >>>> However, guest driver just sends 3 mapi-cmd to vits and 3 ite > >>>> entries is recorded in host. Vfio initializes msi interrupts using > >>>> the number of interrupts 4 provide by qemu. When it comes to the > >>>> 4th msi without ite in vits, in irq_bypass_register_producer, > >>>> producer and consumer will __connect fail, due to find_ite fail, and > >>>> do not resume guest. > >>> > >>> Let me rephrase this to check that I understand it: > >>> - The device has 4 vectors > >>> - The guest only create mappings for 3 of them > >>> - VFIO calls kvm_vgic_v4_set_forwarding() for each vector > >>> - KVM doesn't have a mapping for the 4th vector and returns an error > >>> - VFIO disable this 4th vector > >>> > >>> Is that correct? If yes, I don't understand why that impacts the guest > >>> at all. From what I can see, vfio_msi_set_vector_signal() just prints > >>> a message on the console and carries on. > >>> > >> > >> function calls: > >> --> vfio_msi_set_vector_signal > >> --> irq_bypass_register_producer > >> -->__connect > >> > >> in __connect, add_producer finally calls kvm_vgic_v4_set_forwarding > >> and fails to get the 4th mapping. When add_producer fail, it does > >> not call cons->start, calls kvm_arch_irq_bypass_start and then > >> kvm_arm_resume_guest. > > > > [+Eric, who wrote the irq_bypass infrastructure.] > > > > Ah, so the guest is actually paused, not in a livelock situation > > (which is how I interpreted "stuck"). > > > > I think we should handle this case gracefully, as there should be no > > expectation that the guest will be using this interrupt. Given that > > VFIO seems to be pretty unfazed when a producer fails, I'm temped to > > do the same thing and restart the guest. > > > > Also, __disconnect doesn't care about errors, so why should __connect > > have this odd behaviour? > > _disconnect() does not care as we should always succeed tearing off > things. del_* ops are void functions. On the opposite we can fail > setting up the bypass. > > Effectively > a979a6aa009f ("irqbypass: do not start cons/prod when failed connect") > needs to be reverted. > > I agree the kerneldoc comments in linux/irqbypass.h may be improved to > better explain the role of stop/start cbs and warn about their potential > global impact. Yup. It also begs the question of why we have producer callbacks, as nobody seems to use them. > wrt the case above, "in __connect, add_producer finally calls > kvm_vgic_v4_set_forwarding and fails to get the 4th mapping", shouldn't > we succeed in that case? >From a KVM perspective, we can't return a success because there is no guest LPI that matches the input signal. And such failure seems to be expected by the VFIO code, which just prints a message on the console and set the producer token to NULL. So returning an error from the KVM code is useful, at least to an extent. Thanks, M. -- Without deviation from the norm, progress is not possible.