Re: [PATCH rdma-rc] IB/mlx4: Fix using wrong function to destroy sqp AHs under SRIOV

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 15, 2019 at 04:45:48PM +0200, Leon Romanovsky wrote:
> From: Jack Morgenstein <jackm@xxxxxxxxxxxxxxxxxx>
> 
> The commit cited below replaced rdma_create_ah with mlx4_ib_create_slave_ah
> when creating AHs for the paravirtualized special QPs.
> 
> However, this change also required replacing rdma_destroy_ah with
> mlx4_ib_destroy_ah in the affected flows.
> 
> The commit cited below missed 3 places where rdma_destroy_ah should have
> been replaced with mlx4_ib_destroy_ah.
> 
> As a result, the pd usecount was decremented when the ah was destroyed --
> although the usecount was NOT incremented when the ah was created.
> 
> This caused the pd usecount to become negative, and resulted in the
> WARN_ON stack trace below when the mlx4_ib.ko module was unloaded:
> 
> [ +27.766372] WARNING: CPU: 3 PID: 25303 at drivers/infiniband/core/verbs.c:329 ib_dealloc_pd+0x6d/0x80 [ib_core]
> [  +0.052115] Modules linked in: rdma_ucm rdma_cm iw_cm ib_cm ib_umad mlx4_ib(-) ib_uverbs ib_core mlx4_en mlx4_core nfsv3 nfs fscache configfs xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc dm_mirror dm_region_hash dm_log dm_mod dax rndis_wlan rndis_host coretemp kvm_intel cdc_ether kvm usbnet iTCO_wdt iTCO_vendor_support cfg80211 irqbypass lpc_ich ipmi_si i2c_i801 mii pcspkr i2c_core mfd_core ipmi_devintf i7core_edac ipmi_msghandler ioatdma pcc_cpufreq dca acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi mptsas scsi_transport_sas mptscsih crc32c_intel ata_piix bnx2 mptbase ipv6 crc_ccitt autofs4 [last unloaded: mlx4_core]
> [  +0.347141] CPU: 3 PID: 25303 Comm: modprobe Tainted: G        W I       5.0.0-rc1-net-mlx4+ #1
> [  +0.055956] Hardware name: IBM  -[7148ZV6]-/Node 1, System Card, BIOS -[MLE170CUS-1.70]- 09/23/2011
> [  +0.056114] RIP: 0010:ib_dealloc_pd+0x6d/0x80 [ib_core]
> [  +0.051626] Code: 00 00 85 c0 75 02 5b c3 80 3d aa 87 03 00 00 75 f5 48 c7 c7 88 d7 8f a0 31 c0 c6 05 98 87 03 00 01 e8 07 4c 79 e0 0f 0b 5b c3 <0f> 0b eb be 0f 0b eb ab 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66
> [  +0.114373] RSP: 0018:ffffc90005347e30 EFLAGS: 00010282
> [  +0.052435] RAX: 00000000ffffffea RBX: ffff8888589e9540 RCX: 0000000000000006
> [  +0.055135] RDX: 0000000000000006 RSI: ffff88885d57ad40 RDI: 0000000000000000
> [  +0.054560] RBP: ffff88885b029c00 R08: 0000000000000000 R09: 0000000000000000
> [  +0.054267] R10: 0000000000000001 R11: 0000000000000004 R12: ffff8887f06c0000
> [  +0.053624] R13: ffff8887f06c13e8 R14: 0000000000000000 R15: 0000000000000000
> [  +0.053662] FS:  00007fd6743c6740(0000) GS:ffff88887fcc0000(0000) knlGS:0000000000000000
> [  +0.054958] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  +0.052484] CR2: 0000000000ed1038 CR3: 00000007e3156000 CR4: 00000000000006e0
> [  +0.053886] Call Trace:
> [  +0.048621]  mlx4_ib_close_sriov+0x125/0x180 [mlx4_ib]
> [  +0.051892]  mlx4_ib_remove+0x57/0x1f0 [mlx4_ib]
> [  +0.050903]  mlx4_remove_device+0x92/0xa0 [mlx4_core]
> [  +0.051229]  mlx4_unregister_interface+0x39/0x90 [mlx4_core]
> [  +0.052218]  mlx4_ib_cleanup+0xc/0xd7 [mlx4_ib]
> [  +0.051151]  __x64_sys_delete_module+0x17d/0x290
> [  +0.051631]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> [  +0.051736]  ? do_syscall_64+0x12/0x180
> [  +0.050383]  do_syscall_64+0x4a/0x180
> [  +0.049903]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  +0.051442] RIP: 0033:0x7fd6738b66b7
> [  +0.049525] Code: 73 01 c3 48 8b 0d d1 37 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 37 2c 00 f7 d8 64 89 01 48
> [  +0.114244] RSP: 002b:00007fffc86011a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
> [  +0.056111] RAX: ffffffffffffffda RBX: 0000000000ece490 RCX: 00007fd6738b66b7
> [  +0.056011] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000000000ece4f8
> [  +0.055814] RBP: 0000000000000000 R08: 00007fd673b7b060 R09: 00007fd673926a40
> [  +0.055669] R10: 00007fffc8600f30 R11: 0000000000000202 R12: 0000000000000000
> [  +0.055759] R13: 0000000000000001 R14: 0000000000ece4f8 R15: 0000000000000000
> [  +0.055421] irq event stamp: 8238
> [  +0.051352] hardirqs last  enabled at (8237): [<ffffffff8121c426>] kfree+0xc6/0x190
> [  +0.056534] hardirqs last disabled at (8238): [<ffffffff81001bc7>] trace_hardirqs_off_thunk+0x1a/0x1c
> [  +0.058511] softirqs last  enabled at (3768): [<ffffffff81a0026d>] __do_softirq+0x26d/0x41d
> [  +0.057742] softirqs last disabled at (3755): [<ffffffff81075417>] irq_exit+0xb7/0xc0
> [  +0.056827] ---[ end trace 302e3cc77eb74c0f ]---
> [  +3.877564] mlx4_core 0000:0e:00.0: Disabling SR-IOV
> 
> Fixes: 5e62d5ff1b9a ("IB/mlx4: Create slave AH's directly")
> Signed-off-by: Jack Morgenstein <jackm@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> ---
>  drivers/infiniband/hw/mlx4/mad.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

woops, we all missed this in the first patch.. Applied to for-rc

Thanks,
Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux