Re: [PATCH 1/2] KVM: PPC: Book3S HV: XIVE: do not clear IRQ data of passthrough interrupts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 28 May 2019 14:17:15 +0200
Cédric Le Goater <clg@xxxxxxxx> wrote:

> The passthrough interrupts are defined at the host level and their IRQ
> data should not be cleared unless specifically deconfigured (shutdown)
> by the host. They differ from the IPI interrupts which are allocated
> by the XIVE KVM device and reserved to the guest usage only.
> 
> This fixes a host crash when destroying a VM in which a PCI adapter
> was passed-through. In this case, the interrupt is cleared and freed
> by the KVM device and then shutdown by vfio at the host level.
> 
> [ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00
> [ 1007.360285] Faulting instruction address: 0xc00000000009da34
> [ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1]
> [ 1007.360303] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> [ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core
> [ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4
> [ 1007.360454] NIP:  c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0
> [ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300   Not tainted  (5.1.0-gad7e7d0ef)
> [ 1007.360500] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
> [ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1
> [ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0
> [ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200
> [ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd
> [ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718
> [ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100
> [ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027
> [ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c
> [ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0
> [ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120
> [ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50
> [ 1007.360732] Call Trace:
> [ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable)
> [ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0
> [ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0
> [ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420
> [ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0
> [ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350
> [ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0
> [ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110
> [ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0
> [ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170
> [ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0
> [ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90
> [ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70
> [ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0
> [ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0
> [ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00
> [ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100
> [ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0
> [ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430
> [ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74
> [ 1007.361159] Instruction dump:
> [ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020
> [ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022
> 
> Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")

Maybe worth adding:

Cc: stable@xxxxxxxxxxxxxxx # v4.12+

> Signed-off-by: Cédric Le Goater <clg@xxxxxxxx>
> Signed-off-by: Greg Kurz <groug@xxxxxxxx>
> ---
>  arch/powerpc/kvm/book3s_xive.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 12c8a36dd980..922fd62bcd2a 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -1828,7 +1828,6 @@ static void kvmppc_xive_cleanup_irq(u32 hw_num, struct xive_irq_data *xd)
>  {
>  	xive_vm_esb_load(xd, XIVE_ESB_SET_PQ_01);
>  	xive_native_configure_irq(hw_num, 0, MASKED, 0);
> -	xive_cleanup_irq_data(xd);
>  }
>  
>  void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
> @@ -1842,9 +1841,10 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
>  			continue;
>  
>  		kvmppc_xive_cleanup_irq(state->ipi_number, &state->ipi_data);
> +		xive_cleanup_irq_data(&state->ipi_data);
>  		xive_native_free_irq(state->ipi_number);
>  
> -		/* Pass-through, cleanup too */
> +		/* Pass-through, cleanup too but keep IRQ hw data */
>  		if (state->pt_number)
>  			kvmppc_xive_cleanup_irq(state->pt_number, state->pt_data);
>  

Also, even if this definitely allows to avoid the crash, I'm still not
convinced we should be calling kvmppc_xive_cleanup_irq() for pass-through
at all.

My concern is that kvmppc_xive_clr_mapped() which gets called when VFIO shuts
the interrupt down seems to be doing extra stuff to release the pass-through
interrupt back to the host. But when the KVM device gets released before VFIO
had a chance to do that, kvmppc_xive_clr_mapped() is a nop since both
kvmppc_xive_release() and kvmppc_xive_native_release() set kvm.arch->xive to
NULL. This triggers the following warning in the xics-on-xive case:

[25185.218975] kvmppc_clr_passthru_irq (irq 94, gsi 4870) fails: -19

I'm wondering if we should do this extra stuff from kvmppc_xive_clr_mapped()
as well when closing the KVM device instead of clearing the interrupt as the
we do now.

This needs some more investigation.

Cheers,

--
Greg



[Index of Archives]     [KVM Development]     [KVM ARM]     [KVM ia64]     [Linux Virtualization]     [Linux USB Devel]     [Linux Video]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux