Re: Question about RDMA server...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2022-03-31 at 16:20 +0000, Chuck Lever III wrote:
> 
> 
> > On Mar 31, 2022, at 12:15 PM, Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Thu, 2022-03-31 at 15:59 +0000, Chuck Lever III wrote:
> > > 
> > > 
> > > > On Mar 31, 2022, at 11:21 AM, Chuck Lever III
> > > > <chuck.lever@xxxxxxxxxx> wrote:
> > > > 
> > > > 
> > > > 
> > > > > On Mar 31, 2022, at 11:16 AM, Trond Myklebust
> > > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > 
> > > > > Hi Chuck,
> > > > > 
> > > > > 
> > > > > I'm seeing a very weird stack trace on one of our systems:
> > > > > 
> > > > > [88463.974603] BUG: kernel NULL pointer dereference, address:
> > > > > 0000000000000058
> > > > > [88463.974778] #PF: supervisor read access in kernel mode
> > > > > [88463.974916] #PF: error_code(0x0000) - not-present page
> > > > > [88463.975037] PGD 0 P4D 0
> > > > > [88463.975164] Oops: 0000 [#1] SMP NOPTI
> > > > > [88463.975296] CPU: 4 PID: 12597 Comm: nfsd Kdump: loaded Not
> > > > > tainted 5.15.31-200.pd.17802.el7.x86_64 #1
> > > > > [88463.975452] Hardware name: VMware, Inc. VMware Virtual
> > > > > Platform/440BX Desktop Reference Platform, BIOS 6.00
> > > > > 12/12/2018
> > > > > [88463.975630] RIP: 0010:svc_rdma_sendto+0x26/0x330 [rpcrdma]
> > > > > [88463.975831] Code: 1f 44 00 00 0f 1f 44 00 00 41 57 41 56
> > > > > 41 55
> > > > > 41 54 55 53 48 83 ec 28 4c 8b 6f 20 4c 8b a7 90 01 00 00 48
> > > > > 89 3c
> > > > > 24 49 8b 45 38 <49> 8b 5c 24 58 a8 40 0f 85 f8 01 00 00 49 8b
> > > > > 45
> > > > > 38 a8 04 0f 85 ec
> > > > > [88463.976247] RSP: 0018:ffffad54c20b3e80 EFLAGS: 00010282
> > > > > [88463.976469] RAX: 0000000000004119 RBX: ffff94557ccd8400
> > > > > RCX:
> > > > > ffff9455daf0c000
> > > > > [88463.976705] RDX: ffff9455daf0c000 RSI: ffff9455da8601a8
> > > > > RDI:
> > > > > ffff9455da860000
> > > > > [88463.976946] RBP: ffff9455da860000 R08: ffff945586b6b388
> > > > > R09:
> > > > > ffff945586b6b388
> > > > > [88463.977196] R10: 7f0f1dac00000002 R11: 0000000000000000
> > > > > R12:
> > > > > 0000000000000000
> > > > > [88463.977449] R13: ffff945663080800 R14: ffff9455da860000
> > > > > R15:
> > > > > ffff9455dabb8000
> > > > > [88463.977709] FS:  0000000000000000(0000)
> > > > > GS:ffff94586fd00000(0000) knlGS:0000000000000000
> > > > > [88463.977982] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > > 0000000080050033
> > > > > [88463.978254] CR2: 0000000000000058 CR3: 00000001f9282006
> > > > > CR4:
> > > > > 00000000003706e0
> > > > > [88463.978583] DR0: 0000000000000000 DR1: 0000000000000000
> > > > > DR2:
> > > > > 0000000000000000
> > > > > [88463.978865] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> > > > > DR7:
> > > > > 0000000000000400
> > > > > [88463.979141] Call Trace:
> > > > > [88463.979419]  <TASK>
> > > > > [88463.979693]  ? svc_process_common+0xfa/0x6a0 [sunrpc]
> > > > > [88463.980021]  ? svc_rdma_cma_handler+0x30/0x30 [rpcrdma]
> > > > > [88463.980320]  ? svc_recv+0x48a/0x8c0 [sunrpc]
> > > > > [88463.980662]  svc_send+0x49/0x120 [sunrpc]
> > > > > [88463.981009]  nfsd+0xe8/0x140 [nfsd]
> > > > > [88463.981346]  ? nfsd_shutdown_threads+0x80/0x80 [nfsd]
> > > > > [88463.981675]  kthread+0x127/0x150
> > > > > [88463.981981]  ? set_kthread_struct+0x40/0x40
> > > > > [88463.982284]  ret_from_fork+0x22/0x30
> > > > > [88463.982586]  </TASK>
> > > > > [88463.982886] Modules linked in: nfsv3 bpf_preload xt_nat
> > > > > veth
> > > > > nfs_layout_flexfiles auth_name rpcsec_gss_krb5 nfsv4
> > > > > dns_resolver
> > > > > nfsidmap nfs nfsd auth_rpcgss nfs_acl lockd grace
> > > > > xt_MASQUERADE
> > > > > nf_conntrack_netlink xt_addrtype br_netfilter bridge stp llc
> > > > > overlay nf_nat_ftp nf_conntrack_netbios_ns
> > > > > nf_conntrack_broadcast
> > > > > nf_conntrack_ftp xt_CT xt_sctp ip6t_rpfilter ip6t_REJECT
> > > > > nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack
> > > > > ip6table_nat ip6table_mangle ip6table_security ip6table_raw
> > > > > iptable_nat nf_nat iptable_mangle iptable_security
> > > > > iptable_raw
> > > > > nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink
> > > > > ip6table_filter ip6_tables iptable_filter vsock_loopback
> > > > > vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport
> > > > > vsock
> > > > > bonding ipmi_msghandler vfat fat rpcrdma sunrpc rdma_ucm
> > > > > ib_iser
> > > > > libiscsi scsi_transport_iscsi ib_umad rdma_cm iw_cm ib_ipoib
> > > > > ib_cm intel_rapl_msr intel_rapl_common crct10dif_pclmul
> > > > > mlx5_ib
> > > > > crc32_pclmul ghash_clmulni_intel rapl ib_uverbs ib_core
> > > > > vmw_vmci
> > > > > i2c_piix4
> > > > > [88463.982930]  dm_multipath ip_tables xfs mlx5_core
> > > > > crc32c_intel
> > > > > ttm vmxnet3 vmw_pvscsi drm_kms_helper mlxfw cec
> > > > > pci_hyperv_intf
> > > > > tls ata_generic drm pata_acpi psample
> > > > > 
> > > > > Decoding the address svc_rdma_sendto+0x26 resolves to line
> > > > > 927 in
> > > > > net/sunrpc/xprtrdma/svc_rdma_sendto.c
> > > > > 
> > > > > i.e.
> > > > > 
> > > > > 927             __be32 *rdma_argp = rctxt->rc_recv_buf;
> > > > > 
> > > > > which shows that somehow, we're getting to svc_rdma_sendto()
> > > > > with
> > > > > a
> > > > > request that has rqstp->rq_xprt_ctxt == NULL.
> > > > > 
> > > > > I'm having trouble seeing how that can happen, but thought
> > > > > I'd
> > > > > ask in
> > > > > case you've seen something similar. AFAICS, the code in that
> > > > > area
> > > > > has
> > > > > not changed since early 2021.
> > > > > 
> > > > > As I understand it, knfsd was in the process of shutting down
> > > > > at
> > > > > the
> > > > > time, so it is possible there is a connection to that...
> > > > 
> > > > My immediate response is "that's not supposed to happen". I'll
> > > > give
> > > > it some thought, but yeah, I bet the transport sendto code is
> > > > not
> > > > dealing with connection shutdown properly.
> > > 
> > > I still don't see exactly how @rctxt could be NULL here, but
> > > it might be enough to move the deref of @rctxt down past the
> > > svc_xprt_is_dead() test, say just before "*p++ = *rdma_argp;".
> > > 
> > > In any event, it could be that recent reorganization of generic
> > > transport code might be responsible, or maybe that this is just
> > > a hard race to hit.
> > > 
> > 
> > Hmm... Here's another thought. What if this were a deferred request
> > that is being replayed after an upcall to mountd or the idmapper?
> > It
> > would mean that the synchronous wait in cache_defer_req() failed,
> > so it
> > is going to be rare, but it could happen on a congested system.
> > 
> > AFAICS, svc_defer() does _not_ save rqstp->rq_xprt_ctxt, so
> > svc_deferred_recv() won't restore it either.
> 
> True, but TCP and UDP both use rq_xprt_ctxt, so wouldn't we have
> seen this problem before on a socket transport?

TCP does not set rq_xprt_ctxt, and nobody really uses UDP these days.

> 
> I need to audit code to see if saving rq_xprt_ctxt in svc_defer()
> is safe and reasonable to do. Maybe Bruce has a thought.
> 

It should be safe for the UDP case, AFAICS. I have no opinion as of yet
about how safe it is to do with RDMA.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux