Re: [PATCH] RDMA: null pointer in __ib_umem_release causes kernel panic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2022-01-05 at 12:09 -0400, Jason Gunthorpe wrote:
> On Wed, Jan 05, 2022 at 03:02:34PM +0000, Trond Myklebust wrote:
> > On Wed, 2022-01-05 at 10:37 -0400, Jason Gunthorpe wrote:
> > > On Wed, Jan 05, 2022 at 09:18:41AM -0500,
> > > trondmy@xxxxxxxxxx wrote:
> > > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > > 
> > > > When doing RPC/RDMA, we're seeing a kernel panic when
> > > > __ib_umem_release()
> > > > iterates over the scatter gather list and hits NULL pages.
> > > > 
> > > > It turns out that commit 79fbd3e1241c ended up changing the
> > > > iteration
> > > > from being over only the mapped entries to being over the
> > > > original
> > > > list
> > > > size.
> > > 
> > > You mean this?
> > > 
> > > -       for_each_sg(umem->sg_head.sgl, sg, umem->sg_nents, i)
> > > +       for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i)
> > > 
> > > I don't see what changed there? The invarient should be that
> > > 
> > >   umem->sg_nents == sgt->orig_nents
> > > 
> > > > @@ -55,7 +55,7 @@ static void __ib_umem_release(struct
> > > > ib_device
> > > > *dev, struct ib_umem *umem, int d
> > > >                 ib_dma_unmap_sgtable_attrs(dev, &umem-
> > > > > sgt_append.sgt,
> > > >                                            DMA_BIDIRECTIONAL,
> > > > 0);
> > > >  
> > > > -       for_each_sgtable_sg(&umem->sgt_append.sgt, sg, i)
> > > > +       for_each_sgtable_dma_sg(&umem->sgt_append.sgt, sg, i)
> > > >                 unpin_user_page_range_dirty_lock(sg_page(sg),
> > > 
> > > Calling sg_page() from under a dma_sg iterator is unconditionally
> > > wrong..
> > > 
> > > More likely your case is something has gone wrong when the
> > > sgtable
> > > was
> > > created and it has the wrong value in orig_nents..
> > 
> > Can you define "wrong value" in this case? Chuck's RPC/RDMA code
> > appears to call ib_alloc_mr() with an 'expected maximum number of
> > entries' (depth) in net/sunrpc/xprtrdma/frwr_ops.c:frwr_mr_init().
> > 
> > It then fills that table with a set of n <= depth pages in
> > net/sunrpc/xprtrdma/frwr_ops.c:frwr_map() and calls ib_dma_map_sg()
> > to
> > map them, and then adjusts the sgtable with a call to
> > ib_map_mr_sg().
> 
> I'm confused, RPC/RDMA should never touch a umem at all.
> 
> Is this really the other bug where user and kernel MR are getting
> confused?
> 

As far as I know, RPC/RDMA is just using the RDMA api to register and
unregister chunks of memory, so it is definitely not directly touching
the umem. 

Let me just copy/paste the stack dump that we got. This was a 5.15.12
kernel running on Azure with an infiniband setup. It's a little garbled
(sorry) but the stack trace itself is fairly clear that this is
happening on transport teardown and that we're iterating through a NULL
page.


2022-01-04 17:53:24 kernel: [ 2922.654038] nf[s:  2922.656486] BUG:
kernel NULL pointer dereference, address: 0000000000000000
[ 2922.660304] #PF: supervisor read access in kernel mode
serv[er  1722.1962.20..662320] #PF: error_code(0x0000) - not-present
page
[ 2922.664606] PGD 1d6c381067 P4D 1d6c381067 PUD 1d0c93e067 PMD 0
2[1  no2t9 r2espo2ndi.n667089] Oops: 0000 [#1] SMP NOPTI
[ 2922.668823] CPU: 27 PID: 11384 Comm: kworker/u241:4 Kdump: loaded
Tainted: G        W         5.15.12-200.pd.17715.el7.x86_64 #1
73695] Hardware name: Microsoft Corporation Virtual Machine/Virtual
Machine, BIOS 090008  12/07/2018
[ 2922.677741] Workqueue: xprtiod xprt_autoclose [sunrpc]

[202 2-2019-042 217.:5679875] RIP: 0010:__ib_umem_release+0x7a/0xa0
[ib_uverbs]
[ 2922.682325] Code: f6 cb 48 89 ef e8 16 90 2e cc 48 89 c5 41 39 5c 24
5c 77 cd 5b 49 8d 7c 24 50 5d 41 5c 41 5d e9 ec 96 2e cc 41 bd 01 00 00
00 <48> 8b 3f 48 85 ff 74 9f 41 8b 54 24 5c 49 8b 74 24 50 45 31 c0 31
3:24[  k2e92rn2.el6: [ 920363] RSP: 0018:ffffb53dee517d28 EFLAGS:
00010246
[ 2922.692787] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffff9f222edfa180
[92 2.2692540428.]6 nf9s:5675] RDX: 0000000000000001 RSI:
ffff9f31b0dc4000 RDI: 0000000000000000
[ 2922.698735] RBP: ffff9f31b0dc4000 R08: 0000000000000000 R09:
0000000000000000
 s[er ver2 19722.126.0.701658] R10: ffff9f3ddfb69680 R11:
ffffb53dee517d50 R12: ffff9f31b0dc4000
[ 2922.704764] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff9f23f2cc09b0
[. 212 n9ot2 re2s.po7nd0i8268] FS:  0000000000000000(0000)
GS:ffff9f3d0fcc0000(0000) knlGS:0000000000000000
[ 2922.713056] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
ng[,  st2il9l22 .tr7y1in5g925] CR2: 0000000000000000 CR3:
0000001dafdb6005 CR4: 0000000000370ee0
[ 2922.720003] Call Trace:

 202229-2021.-047 127:1202]  <TASK>
[ 2922.722467]  ib_umem_release+0x2a/0x90 [ib_uverbs]
53:24[ ke rn2el9: [2 2.724829]  mlx5_ib_dereg_mr+0x1fe/0x3d0 [mlx5_ib]
[ 2922.727447]  ib_dereg_mr_user+0x40/0xa0 [ib_core]
2922.6[540 51]2 nfs922.729658]  frwr_mr_release+0x20/0x70 [rpcrdma]
[ 2922.732133]  rpcrdma_xprt_disconnect+0x2b4/0x3a0 [rpcrdma]
[:  se2rve9r 1272.126..734753]  xprt_rdma_close+0xe/0x30 [rpcrdma]
[ 2922.737096]  xprt_autoclose+0x52/0xf0 [sunrpc]
0.[21  no2t r9es2pon2d.739248]  process_one_work+0x1f1/0x390
[ 2922.741247]  worker_thread+0x53/0x3e0
ing[,  s2til9l try2in2.743081]  ? process_one_work+0x390/0x390
[ 2922.745341]  kthread+0x127/0x150
[
  2922.747185]  ? set_kthread_struct+0x40/0x40
[ 2922.749685]  ret_from_fork+0x22/0x30
[ 2922.751889]  </TASK>
[ 2922.753068] Modules linked in: nfsv3 bpf_preload veth
nfs_layout_flexfiles auth_name rpcsec_gss_krb5 nfsv4 dns_resolver
nfsidmap nfs fscache netfs xt_nat xt_MASQUERADE nf_conntrack_netlink
xt_addrtypebr_netfilter bridge stp llc nfsd auth_rpcgss overlay nfs_acl
lockd grace xt_owner nf_nat_ftp nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_conntrack_ftp xt_CT xt_sctp ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack
ip6table_nat ip6table_mangle ip6table_security ip6table_raw iptable_nat
nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ip6table_filter
ip6_tables iptable_filter bonding ipmi_msghandler rpcrdma sunrpc
rdma_ucm ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib
iw_cm ib_cm mlx5_ib raid0 ib_uverbs ib_core vfat fat mlx5_core nvme
mlxfw psample nvme_core tls intel_rapl_msr intel_rapl_common
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pci_hyperv hyperv_drm
hv_balloon
[ 2922.753108]  hv_utils hyperv_keyboard hid_hyperv pci_hyperv_intf
drm_kms_helper cec drm i2c_piix4 hyperv_fb dm_multipath ip_tables xfs
hv_storvsc hv_netvsc scsi_transport_fc crc32c_intel ata_generic
hv_vmbus pata_acpi
[ 2922.803987] CR2: 0000000000000000
[ 2922.805713] ---[ end trace 450d08bd4a337f29 ]---
[ 2922.808718] RIP: 0010:__ib_umem_release+0x7a/0xa0 [ib_uverbs]
[ 2922.812103] Code: f6 cb 48 89 ef e8 16 90 2e cc 48 89 c5 41 39 5c 24
5c 77 cd 5b 49 8d 7c 24 50 5d 41 5c 41 5d e9 ec 96 2e cc 41 bd 01 00 00
00 <48> 8b 3f 48 85 ff 74 9f 41 8b 54 24 5c 49 8b 74 24 50 45 31 c0 31
[ 2922.821369] RSP: 0018:ffffb53dee517d28 EFLAGS: 00010246
[ 2922.824135] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffff9f222edfa180
[ 2922.827639] RDX: 0000000000000001 RSI: ffff9f31b0dc4000 RDI:
0000000000000000
[ 2922.831121] RBP: ffff9f31b0dc4000 R08: 0000000000000000 R09:
0000000000000000
[ 2922.834528] R10: ffff9f3ddfb69680 R11: ffffb53dee517d50 R12:
ffff9f31b0dc4000
[ 2922.838042] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff9f23f2cc09b0
[ 2922.841442] FS:  0000000000000000(0000) GS:ffff9f3d0fcc0000(0000)
knlGS:0000000000000000
[ 2922.845340] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2922.848327] CR2: 0000000000000000 CR3: 0000001dafdb6005 CR4:
0000000000370ee0
[ 2922.852791] Kernel panic - not syncing: Fatal exception
[ 2922.858167] Kernel Offset: 0xc000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)



-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux