Re: kernel oops in generic/013 on an rdma mount (over either soft roce or iwarp)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 13, 2020 at 2:05 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>
>
>
> > On Nov 13, 2020, at 1:49 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> >
> > On Thu, Nov 12, 2020 at 3:49 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >>
> >>
> >>
> >>> On Nov 12, 2020, at 10:37 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> >>>
> >>> On Thu, Nov 12, 2020 at 10:28 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Nov 11, 2020, at 4:42 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> >>>>>
> >>>>> On Tue, Nov 10, 2020 at 1:25 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >>>>>
> >>>>>> Start by identifying what NFS operation is failing, and what configuration
> >>>>>> of chunks it is using.
> >>>>>
> >>>>> This happens after decoding LIST_XATTRS reply. It's a send only reply.
> >>>>> I can't tell if the real problem is in the nfs4_xdr_dec_listxattrs()
> >>>>> and it's overwritting memory that messes with rpcrdma_complete_rqst()
> >>>>> or it's the rdma problem.
> >>>>>
> >>>>> Running it with Kasan shows the following:
> >>>>>
> >>>>> [  538.505743] BUG: KASAN: wild-memory-access in
> >>>>> rpcrdma_complete_rqst+0x41b/0x680 [rpcrdma]
> >>>>> [  538.512019] Write of size 8 at addr 0005088000000000 by task kworker/1:1H/493
> >>>>> [  538.517285]
> >>>>> [  538.518219] CPU: 1 PID: 493 Comm: kworker/1:1H Not tainted 5.10.0-rc3+ #33
> >>>>> [  538.521811] Hardware name: VMware, Inc. VMware Virtual
> >>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
> >>>>> [  538.529220] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> >>>>> [  538.532722] Call Trace:
> >>>>> [  538.534366]  dump_stack+0x7c/0xa2
> >>>>> [  538.536473]  ? rpcrdma_complete_rqst+0x41b/0x680 [rpcrdma]
> >>>>> [  538.539514]  ? rpcrdma_complete_rqst+0x41b/0x680 [rpcrdma]
> >>>>> [  538.542817]  kasan_report.cold.9+0x6a/0x7c
> >>>>> [  538.545952]  ? rpcrdma_complete_rqst+0x41b/0x680 [rpcrdma]
> >>>>> [  538.551001]  check_memory_region+0x198/0x200
> >>>>> [  538.553763]  memcpy+0x38/0x60
> >>>>> [  538.555612]  rpcrdma_complete_rqst+0x41b/0x680 [rpcrdma]
> >>>>> [  538.558974]  ? rpcrdma_reset_cwnd+0x70/0x70 [rpcrdma]
> >>>>> [  538.562162]  ? ktime_get+0x4f/0xb0
> >>>>> [  538.564072]  ? rpcrdma_reply_handler+0x4ca/0x640 [rpcrdma]
> >>>>> [  538.567066]  __ib_process_cq+0xa7/0x1f0 [ib_core]
> >>>>> [  538.569905]  ib_cq_poll_work+0x31/0xb0 [ib_core]
> >>>>> [  538.573151]  process_one_work+0x387/0x680
> >>>>> [  538.575798]  worker_thread+0x57/0x5a0
> >>>>> [  538.577917]  ? process_one_work+0x680/0x680
> >>>>> [  538.581095]  kthread+0x1c8/0x1f0
> >>>>> [  538.583271]  ? kthread_parkme+0x40/0x40
> >>>>> [  538.585637]  ret_from_fork+0x22/0x30
> >>>>> [  538.587688] ==================================================================
> >>>>> [  538.591920] Disabling lock debugging due to kernel taint
> >>>>> [  538.595267] general protection fault, probably for non-canonical
> >>>>> address 0x5088000000000: 0000 [#1] SMP KASAN PTI
> >>>>> [  538.601982] CPU: 1 PID: 3623 Comm: fsstress Tainted: G    B
> >>>>>  5.10.0-rc3+ #33
> >>>>> [  538.609032] Hardware name: VMware, Inc. VMware Virtual
> >>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
> >>>>> [  538.619678] RIP: 0010:memcpy_orig+0xf5/0x10f
> >>>>> [  538.623892] Code: 00 00 00 00 00 83 fa 04 72 1b 8b 0e 44 8b 44 16
> >>>>> fc 89 0f 44 89 44 17 fc c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 83 ea
> >>>>> 01 72 19 <0f> b6 0e 74 12 4c 0f b6 46 01 4c 0f b6 0c 16 44 88 47 01 44
> >>>>> 88 0c
> >>>>> [  538.636726] RSP: 0018:ffff888018707628 EFLAGS: 00010202
> >>>>> [  538.641125] RAX: ffff888009098855 RBX: 0000000000000008 RCX: ffffffffc19eca2d
> >>>>> [  538.645793] RDX: 0000000000000001 RSI: 0005088000000000 RDI: ffff888009098855
> >>>>> [  538.650290] RBP: 0000000000000000 R08: ffffed100121310b R09: ffffed100121310b
> >>>>> [  538.654879] R10: ffff888009098856 R11: ffffed100121310a R12: ffff888018707788
> >>>>> [  538.658700] R13: ffff888009098858 R14: ffff888009098857 R15: 0000000000000002
> >>>>> [  538.662871] FS:  00007f12be1c8740(0000) GS:ffff88805ca40000(0000)
> >>>>> knlGS:0000000000000000
> >>>>> [  538.667424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> [  538.670768] CR2: 00007f12be1c7000 CR3: 000000005c37a004 CR4: 00000000001706e0
> >>>>> [  538.675865] Call Trace:
> >>>>> [  538.677376]  nfs4_xdr_dec_listxattrs+0x31d/0x3c0 [nfsv4]
> >>>>> [  538.681151]  ? nfs4_xdr_dec_read_plus+0x360/0x360 [nfsv4]
> >>>>> [  538.684232]  ? xdr_inline_decode+0x1e/0x260 [sunrpc]
> >>>>> [  538.687114]  call_decode+0x365/0x390 [sunrpc]
> >>>>> [  538.689626]  ? rpc_decode_header+0x770/0x770 [sunrpc]
> >>>>> [  538.692410]  ? var_wake_function+0x80/0x80
> >>>>> [  538.694634]  ?
> >>>>> xprt_request_retransmit_after_disconnect.isra.15+0x5e/0x80 [sunrpc]
> >>>>> [  538.698920]  ? rpc_decode_header+0x770/0x770 [sunrpc]
> >>>>> [  538.701578]  ? rpc_decode_header+0x770/0x770 [sunrpc]
> >>>>> [  538.704252]  __rpc_execute+0x11c/0x6e0 [sunrpc]
> >>>>> [  538.706829]  ? trace_event_raw_event_xprt_cong_event+0x270/0x270 [sunrpc]
> >>>>> [  538.710654]  ? rpc_make_runnable+0x54/0xe0 [sunrpc]
> >>>>> [  538.713416]  rpc_run_task+0x29c/0x2c0 [sunrpc]
> >>>>> [  538.715806]  nfs4_call_sync_custom+0xc/0x40 [nfsv4]
> >>>>> [  538.718551]  nfs4_do_call_sync+0x114/0x160 [nfsv4]
> >>>>> [  538.721571]  ? nfs4_call_sync_custom+0x40/0x40 [nfsv4]
> >>>>> [  538.724333]  ? __alloc_pages_nodemask+0x200/0x410
> >>>>> [  538.726794]  ? kasan_unpoison_shadow+0x30/0x40
> >>>>> [  538.729147]  ? __kasan_kmalloc.constprop.8+0xc1/0xd0
> >>>>> [  538.731733]  _nfs42_proc_listxattrs+0x1f6/0x2f0 [nfsv4]
> >>>>> [  538.734504]  ? nfs42_offload_cancel_done+0x50/0x50 [nfsv4]
> >>>>> [  538.737173]  ? __kernel_text_address+0xe/0x30
> >>>>> [  538.739400]  ? unwind_get_return_address+0x2f/0x50
> >>>>> [  538.741727]  ? create_prof_cpu_mask+0x20/0x20
> >>>>> [  538.744141]  ? stack_trace_consume_entry+0x80/0x80
> >>>>> [  538.746585]  ? _raw_spin_lock+0x7a/0xd0
> >>>>> [  538.748477]  nfs42_proc_listxattrs+0xf4/0x150 [nfsv4]
> >>>>> [  538.750920]  ? nfs42_proc_setxattr+0x150/0x150 [nfsv4]
> >>>>> [  538.753557]  ? nfs4_xattr_cache_list+0x91/0x120 [nfsv4]
> >>>>> [  538.756313]  nfs4_listxattr+0x34d/0x3d0 [nfsv4]
> >>>>> [  538.758506]  ? _nfs4_proc_access+0x260/0x260 [nfsv4]
> >>>>> [  538.760859]  ? __ia32_sys_rename+0x40/0x40
> >>>>> [  538.762897]  ? selinux_quota_on+0xf0/0xf0
> >>>>> [  538.764827]  ? __check_object_size+0x178/0x220
> >>>>> [  538.767063]  ? kasan_unpoison_shadow+0x30/0x40
> >>>>> [  538.769233]  ? security_inode_listxattr+0x53/0x60
> >>>>> [  538.771962]  listxattr+0x5b/0xf0
> >>>>> [  538.773980]  path_listxattr+0xa1/0x100
> >>>>> [  538.776147]  ? listxattr+0xf0/0xf0
> >>>>> [  538.778064]  do_syscall_64+0x33/0x40
> >>>>> [  538.780714]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [  538.783440] RIP: 0033:0x7f12bdaebc8b
> >>>>> [  538.785341] Code: f0 ff ff 73 01 c3 48 8b 0d fa 21 2c 00 f7 d8 64
> >>>>> 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 c2 00 00
> >>>>> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cd 21 2c 00 f7 d8 64 89
> >>>>> 01 48
> >>>>> [  538.794845] RSP: 002b:00007ffc7947cff8 EFLAGS: 00000206 ORIG_RAX:
> >>>>> 00000000000000c2
> >>>>> [  538.798985] RAX: ffffffffffffffda RBX: 0000000001454650 RCX: 00007f12bdaebc8b
> >>>>> [  538.802840] RDX: 0000000000000018 RSI: 000000000145c150 RDI: 0000000001454650
> >>>>> [  538.807093] RBP: 000000000145c150 R08: 00007f12bddaeba0 R09: 00007ffc7947cc46
> >>>>> [  538.811487] R10: 0000000000000000 R11: 0000000000000206 R12: 00000000000002dd
> >>>>> [  538.816713] R13: 0000000000000018 R14: 0000000000000018 R15: 0000000000000000
> >>>>> [  538.820437] Modules linked in: rpcrdma rdma_rxe ip6_udp_tunnel
> >>>>> udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core cts
> >>>>> rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace nfs_ssc nls_utf8
> >>>>> isofs fuse rfcomm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
> >>>>> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
> >>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun bridge stp llc
> >>>>> ip6_tables nft_compat ip_set nf_tables nfnetlink bnep
> >>>>> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event
> >>>>> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul
> >>>>> vmw_balloon ghash_clmulni_intel joydev pcspkr btusb uvcvideo btrtl
> >>>>> btbcm btintel videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
> >>>>> videobuf2_common snd_ens1371 snd_ac97_codec ac97_bus snd_seq snd_pcm
> >>>>> videodev bluetooth mc rfkill ecdh_generic ecc snd_timer snd_rawmidi
> >>>>> snd_seq_device snd soundcore vmw_vmci i2c_piix4 auth_rpcgss sunrpc
> >>>>> ip_tables xfs libcrc32c sr_mod cdrom sg crc32c_intel ata_generic
> >>>>> serio_raw nvme vmwgfx drm_kms_helper
> >>>>> [  538.820577]  syscopyarea sysfillrect sysimgblt fb_sys_fops cec
> >>>>> nvme_core t10_pi ata_piix ahci libahci ttm vmxnet3 drm libata
> >>>>> [  538.869541] ---[ end trace 4abb2d95a72e9aab ]---
> >>>>
> >>>> I'm running a v5.10-rc3 client now with the fix applied and
> >>>> KASAN enabled. I traced my xfstests run and I'm definitely
> >>>> exercising the LISTXATTRS path:
> >>>>
> >>>> # trace-cmd report -F rpc_request | grep LISTXATTR | wc -l
> >>>> 462
> >>>>
> >>>> No crash or KASAN splat. I'm still missing something.
> >>>>
> >>>> I sometimes apply small patches from the mailing list by hand
> >>>> editing the modified file. Are you sure you applied my fix
> >>>> correctly?
> >>>
> >>> Again I think the difference is the SoftRoce vs hardware
> >>
> >> I'm not finding that plausible... Can you troubleshoot this further
> >> to demonstrate how soft RoCE is contributing to this issue?
> >
> > Btw I'm not the only one. Anna's tests are failing too.
> >
> > OK so all this started with xattr patches (Trond's tags/nfs-for-5.9-1
> > don't pass the generic/013). Your patch isn't sufficient to fix it.
>
> That might be true, but I'm not sure how you can reach that
> conclusion with certainty, because you don't yet have a root
> cause.

I dont understand the comment?  The logic is exactly the same as in
your fix which says that the lack of appropriately allocated memory
makes the memory access go to inappropriate memory.


>
> > When I increased decode_listxattrs_maxsz by 5 then Oops went away (but
> > that's a hack). Here's a fix on top of your fix that allows for my
> > tests to run.
> >
> > diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > index d0ddf90c9be4..49573b285946 100644
> > --- a/fs/nfs/nfs42xdr.c
> > +++ b/fs/nfs/nfs42xdr.c
> > @@ -179,7 +179,7 @@
> >                                 1 + nfs4_xattr_name_maxsz + 1)
> > #define decode_setxattr_maxsz   (op_decode_hdr_maxsz +
> > decode_change_info_maxsz)
> > #define encode_listxattrs_maxsz  (op_encode_hdr_maxsz + 2 + 1)
> > -#define decode_listxattrs_maxsz  (op_decode_hdr_maxsz + 2 + 1 + 1 + 1)
> > +#define decode_listxattrs_maxsz  (op_decode_hdr_maxsz + 2 + 1 + 1 + 5)
> > #define encode_removexattr_maxsz (op_encode_hdr_maxsz + 1 + \
> >                                  nfs4_xattr_name_maxsz)
> > #define decode_removexattr_maxsz (op_decode_hdr_maxsz + \
> >
> > I would like Frank to comment on what the decode_listattrs_maxsz
> > should be. According to the RFC:
> >
> >   /// struct LISTXATTRS4resok {
> >   ///         nfs_cookie4    lxr_cookie;
> >   ///         xattrkey4      lxr_names<>;
> >   ///         bool           lxr_eof;
> >   /// };
> >
> > This is current code (with Chuck's fix): op_decode_hdr_maxsz + 2 + 1 + 1 + 1
> >
> > I don't see how the xattrkey4  lxr_names<> can be estimated by a
> > constant number.
>
> That's right, the size of xattrkey4 is not provided by the maxsz
> constant.
>
> nfs4_xdr_enc_listxattrs() invokes rpc_prepare_reply_pages() to set
> up the pages needed for the variable part of the reply.
>
>
> > So I think the real fix should be:
> >
> > diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > index d0ddf90c9be4..73b44f8c036d 100644
> > --- a/fs/nfs/nfs42xdr.c
> > +++ b/fs/nfs/nfs42xdr.c
> > @@ -179,7 +179,8 @@
> >                                 1 + nfs4_xattr_name_maxsz + 1)
> > #define decode_setxattr_maxsz   (op_decode_hdr_maxsz +
> > decode_change_info_maxsz)
> > #define encode_listxattrs_maxsz  (op_encode_hdr_maxsz + 2 + 1)
> > -#define decode_listxattrs_maxsz  (op_decode_hdr_maxsz + 2 + 1 + 1 + 1)
> > +#define decode_listxattrs_maxsz  (op_decode_hdr_maxsz + 2 + 1 + 1 + \
> > +                                 XDR_QUADLEN(NFS4_OPAQUE_LIMIT))
> > #define encode_removexattr_maxsz (op_encode_hdr_maxsz + 1 + \
> >                                  nfs4_xattr_name_maxsz)
> > #define decode_removexattr_maxsz (op_decode_hdr_maxsz + \
> >
> >
> >
> >
> >
> >
> >
> >
> >>
> >
> >>
> >> --
> >> Chuck Lever
>
> --
> Chuck Lever
>
>
>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux