> On Mar 13, 2018, at 10:51 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > > >> On Mar 13, 2018, at 9:16 AM, Kalderon, Michal <Michal.Kalderon@xxxxxxxxxx> wrote: >> >>> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma- >>> owner@xxxxxxxxxxxxxxx] On Behalf Of Kalderon, Michal >>> >>>> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx] >>>> Sent: Wednesday, February 14, 2018 6:58 PM >>>> >>>> >>>>> On Feb 14, 2018, at 11:49 AM, Kalderon, Michal >>>> <Michal.Kalderon@xxxxxxxxxx> wrote: >>>>> >>>>>> From: Leon Romanovsky [mailto:leon@xxxxxxxxxx] >>>>>> Sent: Wednesday, February 14, 2018 6:34 PM >>>>>> To: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>>>> Cc: Kalderon, Michal <Michal.Kalderon@xxxxxxxxxx>; Le, Thong >>>>>> <Thong.Le@xxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx >>>>>> Subject: Re: rdma resource warning on 4.16-rc1 when unloading qedr >>>>>> after NFS mount >>>>>> >>>>>> On Wed, Feb 14, 2018 at 11:20:39AM -0500, Chuck Lever wrote: >>>>>>> >>>>>>> >>>>>>>> On Feb 14, 2018, at 11:00 AM, Kalderon, Michal >>>>>> <Michal.Kalderon@xxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> Hi Leon, Chuck, >>>>>>>> >>>>>>>> We ran nfs mount over qedr using 4.16-rc1 When unloading qedr we >>>>>>>> get a WARNING from the resource tracker ( pasted below) >>>>>>>> >>>>>>>> Can you please advise on the best way to debug this? How can we >>>>>>>> get >>>>>> more info on the resource not being freed? >>>>>>> >>>>>>> I haven't seen this kind of report before, so I can't directly >>>>>>> answer your questions. But can you tell us more about reproducing it: >>>>>> >>>>>> It is resource tracking which was entered in last merge window. >>>>>> >>>>>>> >>>>>>> - Is there a workload running on the NFS mount point when the >>>>>>> module is unloaded? >>>>> no >>>>>>> >>>>>>> - Is the issue 100% reproducible, or intermittent? >>>>> Seems to be >>>>>>> >>>>>>> - Have you tried bisecting? >>>>> No, bisecting is a tough one here since we ran this scenario to >>>>> verify the last Two related nfs fixes >>>>> e89e8d8 xprtrdma: Fix BUG after a device removal 1179e2c xprtrdma: >>>>> Fix calculation of ri_max_send_sges >>>>> >>>>>> >>>>>> It will be one of three patches: >>>>>> 9d5f8c209b3f RDMA/core: Add resource tracking for create and >>>>>> destroy PDs 08f294a1524b RDMA/core: Add resource tracking for >>>>>> create and destroy CQs >>>>>> 78a0cd648a80 RDMA/core: Add resource tracking for create and >>>>>> destroy QPs >>>>> Do you think these could lead to a resource not being freed? Or only >>>>> issues >>>> with tracking? >>>>> >>>>>> >>>>>>> >>>>>>> - iWARP, RoCE, or both? >>>>> Only tested over RoCE for now >>>>>>> >>>>>>> - Have you tried reproducing with a different model of device? >>>>> no >>>>>> >>>>>> I doubt that it is related to device, it looks like a resource leak >>>>>> while removing rpcrdma. >>>>>> >>>>>> We definitely need to add more information to this warning to >>>>>> understand which one of three available resources wasn't freed. >>>>> >>>>> Missed an output from our driver saying there's a PD not freed. As >>>>> mentioned, due to other Issues we're not sure whether we've seen >>>>> this >>>> message from our driver in the past. >>>> >>>> When I've tested device unload with rpcrdma.ko, the unload hangs if >>>> rpcrdma.ko doesn't release all resources. >>>> >>>> rpcrdma_ia_remove() releases transport resources. It destroys the QP >>>> and CQs, but leaves the ID and PD to be destroyed by the device driver or >>> core. >>>> The CM event handler returns 1 to signal this is the case. >>>> >>>> I suspect it could be a driver bug. >>> Our driver doesn't take care of releasing PDs, it counts on layers above to do >>> so. >>> Why should the PD be treated differently than the CQs/QPs in this case? >>> we will look into this further to understand whether this is newly introduced. >>> thanks >> >> Hi Chuck, the PD that is not freed here by rpcrdma is freed if we issue a umount. >> >> Mount: this is the creation of the pd: >> [ 1162.401116] ? rpcrdma_create_id+0x20b/0x270 [rpcrdma] >> [ 1162.401124] rpcrdma_ia_open+0x40/0xe0 [rpcrdma] >> [ 1162.401132] xprt_setup_rdma+0x110/0x3a0 [rpcrdma] >> [ 1162.401147] xprt_create_transport+0x7d/0x210 [sunrpc] >> [ 1162.401161] rpc_create+0xc5/0x1c0 [sunrpc] >> >> Umount: >> [ 1011.602701] qedr_dealloc_pd+0x18/0x90 [qedr] >> [ 1011.602709] ib_dealloc_pd+0x45/0x80 [ib_core] >> [ 1011.602716] rpcrdma_ia_close+0x57/0x70 [rpcrdma] >> [ 1011.602719] xprt_rdma_destroy+0x4d/0xb0 [rpcrdma] > > That is by design. Whether that design is correct or not remains to be seen. > > It wasn't clear to me that deallocating the PD on device removal was > necessary. At least the ID has to stay around until the core removes it. > > No-one complained about the missing ib_dealloc_pd during review. > > And, since I was able to unload the device driver with the current design, > I thought my assumption about leaving the PD was correct. Under normal > circumstances, with the current kernel, this is still the case, and I don't > see restracker warnings unless the transport is in some pathological state. > > >> Why not call rpcrdma_ia_close from rpcrdma_ia_remove > > rpcrdma_ia_close also destroys the ID. > > I suppose that since the actual work of tearing things down is done in > another thread, it would be safe for xprtrdma to destroy the ID itself, > rather than having the core do it once the upcall returns. In at least > one of the prototypes, the tear-down was done in the upcall thread, > so the ID had to be left alone. That aspect of the design has stayed > in the code--perhaps unnecessarily? I take that back: the core is holding a mutex during the upcall, so calling rdma_destroy_id will likely deadlock no matter what thread is calling. The most back-portable approach might be to dealloc the PD in rpcrdma_ia_remove. rpcrdma_ia_close and rpcrdma_ia_remove can then be de-duplicated in a subsequent patch. 447 ib_free_cq(ep->rep_attr.recv_cq); 448 ib_free_cq(ep->rep_attr.send_cq); +++ ib_dealloc_pd(ia->ri_pd); 449 Fixes: bebd03186 ("xprtrdma: Support unplugging an HCA from under an NFS mount") Can you give that a try? > Advice on this is welcome! > > >> Thanks, >> Michal >> >>> >>>> >>>> >>>>>>>> Thanks, >>>>>>>> Michal >>>>>>>> >>>>>>>> GAD17990 login: [ 300.480137] ib_srpt srpt_remove_one(qedr0): >>>>>>>> nothing >>>>>> to do. >>>>>>>> [ 300.515527] ib_srpt srpt_remove_one(qedr1): nothing to do. >>>>>>>> [ 300.542182] rpcrdma: removing device qedr1 for >>>>>>>> 192.168.110.146:20049 [ 300.573789] WARNING: CPU: 12 PID: 3545 >>>>>>>> at >>>>>>>> drivers/infiniband/core/restrack.c:20 >>>>>>>> rdma_restrack_clean+0x25/0x30 [ib_core] [ 300.625985] Modules >>>>>>>> linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma >>>>>>>> ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi >>>>>>>> ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib >>>>>>>> rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm 8021q >>> garp >>>>>>>> mrp qedr(-) ib_core xt_CHECKSUM iptable_mangle >>> ipt_MASQUERADE >>>>>>>> nf_nat_masquerade_ipv4 >>>> iptable_nat >>>>>>>> nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack >>>>>>>> nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc >>>>>>>> ebtable_filter ebtables fuse ip6table_filter ip6_tables >>>>>>>> iptable_filter dm_mirror dm_region_hash dm_log dm_mod vfat fat >>>>>>>> dax intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp >>>> coretemp >>>>>>>> kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul >>>>>>>> ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper >>>>>>>> cryptd ipmi_si [ 300.972993] iTCO_wdt ipmi_devintf sg pcspkr >>>>>> iTCO_vendor_support hpwdt hpilo lpc_ich ipmi_msghandler >>> pcc_cpufreq >>>>>> ioatdma i2c_i801 mfd_core wmi shpchp dca acpi_power_meter i2c_core >>>>>> nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c >>>>>> sd_mod qede qed crc32c_intel tg3 hpsa scsi_transport_sas crc8 [ >>>> 301.109036] CPU: 12 PID: >>>>>> 3545 Comm: rmmod Not tainted 4.16.0-rc1 #1 [ 301.139518] Hardware >>>> name: >>>>>> HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 02/17/2017 [ >>>>>> 301.180411] RIP: 0010:rdma_restrack_clean+0x25/0x30 [ib_core] [ >>>>>> 301.208350] RSP: 0018:ffffb1820478fe88 EFLAGS: 00010286 [ >>>>>> 301.233241] >>>>>> RAX: 0000000000000000 RBX: ffffa099ed1b4070 RCX: ffffdf02a193c800 [ >>>>>> 301.268001] RDX: ffffa095ed12d7a0 RSI: 0000000000025900 RDI: >>>>>> ffffa099ed1b47d0 [ 301.302530] RBP: ffffa099ed1b4070 R08: >>>>>> ffffa095de9dd000 R09: 0000000180080007 [ 301.337245] R10: >>>>>> 0000000000000001 R11: ffffa095de9dd000 R12: ffffa099ed1b4000 [ >>>>>> 301.372151] R13: ffffa099ed1b405c R14: 0000000000e231c0 R15: >>>>>> 0000000000e23010 [ 301.407384] FS: 00007f2b0c854740(0000) >>>>>> GS:ffffa099ff700000(0000) knlGS:0000000000000000 [ 301.447026] CS: >>>>>> 0010 >>>>>> DS: 0000 ES: 0000 CR0: 0000000080050033 [ 301.475409] CR2: >>>>>> 0000000000e2caf8 CR3: 0000000865c0d006 CR4: 00000000001606e0 [ >>>>>> 301.510892] Call Trace: >>>>>>>> [ 301.522715] ib_unregister_device+0xf5/0x190 [ib_core] [ >>>>>>>> 301.547966] qedr_remove+0x37/0x60 [qedr] [ 301.568393] >>>>>>>> qede_rdma_unregister_driver+0x4b/0x90 [qede] [ 301.594980] >>>>>>>> SyS_delete_module+0x168/0x240 [ 301.615057] >>>>>>>> do_syscall_64+0x6f/0x1a0 [ 301.633588] >>>>>>>> entry_SYSCALL_64_after_hwframe+0x21/0x86 >>>>>>>> [ 301.658657] RIP: 0033:0x7f2b0bd33707 [ 301.676005] RSP: >>>>>>>> 002b:00007ffdefa29d98 EFLAGS: 00000202 ORIG_RAX: >>>> 00000000000000b0 >>>>>> [ >>>>>>>> 301.713324] RAX: ffffffffffffffda RBX: 0000000000e231c0 RCX: >>>>>>>> 00007f2b0bd33707 [ 301.748186] RDX: 00007f2b0bda3a80 RSI: >>>>>>>> 0000000000000800 RDI: 0000000000e23228 [ 301.782960] RBP: >>>>>>>> 0000000000000000 R08: 00007f2b0bff8060 R09: 00007f2b0bda3a80 [ >>>>>>>> 301.818142] R10: 00007ffdefa29b20 R11: 0000000000000202 R12: >>>>>>>> 00007ffdefa2b70d [ 301.853290] R13: 0000000000000000 R14: >>>>>>>> 0000000000e231c0 R15: 0000000000e23010 [ 301.888138] Code: 84 00 >>>>>>>> 00 >>>>>>>> 00 00 00 0f 1f 44 00 00 48 83 c7 28 31 c0 eb 0c 48 83 c0 08 48 3d >>>>>>>> 00 >>>>>>>> 08 00 00 74 0f 48 8d 14 07 48 8b 12 48 85 d2 74 e8 <0f> ff c3 f3 >>>>>>>> c3 >>>>>>>> 66 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 47 28 [ 301.981140] >>>>>>>> ---[ end trace 28dec8f15205789a ]--- >>>>>>> >>>>>>> -- >>>>>>> Chuck Lever >>>>>>> >>>>>>> >>>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" >>>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More >>>> majordomo >>>>> info at http://vger.kernel.org/majordomo-info.html >>>> >>>> -- >>>> Chuck Lever >>>> >>>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the >>> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at >>> http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html