Re: kernel oops in generic/013 on an rdma mount (over either soft roce or iwarp)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Nov 9, 2020, at 6:17 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> 
> On Mon, Nov 9, 2020 at 6:07 PM Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>> 
>> On Mon, Nov 9, 2020 at 6:01 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>> 
>>> 
>>> 
>>>> On Nov 9, 2020, at 5:55 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>> 
>>>> Hi Chuck,
>>>> 
>>>> generic/013 on 5.10-rc3 under both soft RoCE and iWarp produce the
>>>> following kernel oops.
>>>> Are you aware of it? 5.9 ran fine. In 5.10-rc1/rc2 both soft RoCE and
>>>> iWarp were broken (outside of nfs) so can't test. I'll see what I can
>>>> find out more but wanted to run it by you first. Thank you.
>>> 
>>> Could be this:
>>> 
>>> https://lore.kernel.org/linux-nfs/160416263202.2615192.7554388264467271587.stgit@xxxxxxxxxxxxxxxxxxxxx/T/#u
>> 
>> So what does that mean: are you planning to post this patch? That
>> patch never ended in even 5.10-rc3?

The URL refers to a linux-nfs mail archive, so that patch has already
been posted (in October). The client maintainers need to merge it.


> Which those changes applied, I get the following oops:

What's your workload? Do you have a reproducer?

What's the output of

$ scripts/faddr2line net/sunrpc/xprtrdma/rpc_rdma.o rpcrdma_complete_rqst+0x294

(On my system it's in the middle of rpcrdma_inline_fixup(), for example).


> [   54.501538] run fstests generic/013 at 2020-11-09 18:10:16
> [   65.555863] general protection fault, probably for non-canonical
> address 0x28fb180000000: 0000 [#1] SMP PTI
> [   65.562715] CPU: 0 PID: 490 Comm: kworker/0:1H Not tainted 5.10.0-rc3+ #32
> [   65.566089] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
> [   65.571259] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [   65.574099] RIP: 0010:rpcrdma_complete_rqst+0x294/0x400 [rpcrdma]
> [   65.577254] Code: 4c 63 c2 48 c1 f9 06 48 c1 e1 0c 48 03 0d c4 88
> ed e9 48 01 f1 49 83 f8 08 0f 82 68 ff ff ff 48 8b 30 48 8d 79 08 48
> 83 e7 f8 <48> 89 31 4a 8b 74 00 f8 4a 89 74 01 f8 48 29 f9 48 89 c6 48
> 29 ce
> [   65.587561] RSP: 0018:ffffadbcc18efdd8 EFLAGS: 00010202
> [   65.590890] RAX: ffff98a1ddbd208c RBX: ffff98a1b0c20fc0 RCX: 00028fb180000000
> [   65.594829] RDX: 0000000000000008 RSI: 0100000000003178 RDI: 00028fb180000008
> [   65.598956] RBP: ffff98a1ba249200 R08: 0000000000000008 R09: 0000000000000008
> [   65.602641] R10: ffff98a1b0c20fb8 R11: 0000000000000008 R12: ffff98a1f44b8010
> [   65.607044] R13: 0000000000000000 R14: 0000000000000078 R15: 0000000000001000
> [   65.611062] FS:  0000000000000000(0000) GS:ffff98a1fbe00000(0000)
> knlGS:0000000000000000
> [   65.615928] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   65.620071] CR2: 00007f048c00b668 CR3: 0000000005bde005 CR4: 00000000001706f0
> [   65.623661] Call Trace:
> [   65.624907]  __ib_process_cq+0x89/0x150 [ib_core]
> [   65.627238]  ib_cq_poll_work+0x26/0x80 [ib_core]
> [   65.629623]  process_one_work+0x1a4/0x340
> [   65.632506]  ? process_one_work+0x340/0x340
> [   65.634627]  worker_thread+0x30/0x370
> [   65.636395]  ? process_one_work+0x340/0x340
> [   65.639333]  kthread+0x116/0x130
> [   65.642022]  ? kthread_park+0x80/0x80
> [   65.645183]  ret_from_fork+0x22/0x30
> [   65.647019] Modules linked in: cts rpcsec_gss_krb5 nfsv4
> dns_resolver nfs lockd grace nfs_ssc rpcrdma rdma_ucm rdma_cm iw_cm
> ib_cm ib_uverbs siw ib_core nls_utf8 isofs fuse rfcomm nft_fib_inet
> nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
> nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 tun bridge stp llc ip6_tables nft_compat ip_set
> nf_tables nfnetlink bnep vmw_vsock_vmci_transport vsock snd_seq_midi
> snd_seq_midi_event intel_rapl_msr intel_rapl_common crct10dif_pclmul
> crc32_pclmul vmw_balloon ghash_clmulni_intel btusb btrtl btbcm btintel
> pcspkr joydev uvcvideo snd_ens1371 videobuf2_vmalloc snd_ac97_codec
> videobuf2_memops ac97_bus videobuf2_v4l2 videobuf2_common bluetooth
> snd_seq videodev rfkill snd_pcm mc ecdh_generic ecc snd_timer
> snd_rawmidi snd_seq_device snd soundcore vmw_vmci i2c_piix4
> auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg
> crc32c_intel ata_generic serio_raw vmwgfx nvme drm_kms_helper
> syscopyarea sysfillrect sysimgblt
> [   65.647074]  nvme_core t10_pi fb_sys_fops ata_piix ahci libahci
> vmxnet3 cec ttm libata drm
> [   65.705629] ---[ end trace acdae4b270638f48 ]---
> 
> 
>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> [  126.767318] run fstests generic/013 at 2020-11-09 17:03:25
>>>> [  126.931805] BUG: unable to handle page fault for address: ffffa085363bb010
>>>> [  126.935622] #PF: supervisor write access in kernel mode
>>>> [  126.938202] #PF: error_code(0x0003) - permissions violation
>>>> [  126.941042] PGD 3fe02067 P4D 3fe02067 PUD 3fe06067 PMD 74e77063 PTE
>>>> 80000000763bb061
>>>> [  126.944882] Oops: 0003 [#1] SMP PTI
>>>> [  126.946985] CPU: 0 PID: 2924 Comm: fsstress Not tainted 5.10.0-rc3+ #32
>>>> [  126.950482] Hardware name: VMware, Inc. VMware Virtual
>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
>>>> [  126.955680] RIP: 0010:rpcrdma_convert_iovs.isra.32+0x125/0x190 [rpcrdma]
>>>> [  126.959175] Code: 03 74 70 83 f9 05 74 6b 49 8b 45 18 48 85 c0 74
>>>> 43 49 8b 4d 10 89 c2 89 ce 81 e6 ff 0f 00 00 85 c0 74 31 bf 00 10 00
>>>> 00 89 f8 <49> 89 48 10 29 f0 49 c7 40 08 00 00 00 00 39 d0 0f 47 c2 49
>>>> 83 c0
>>>> [  126.968901] RSP: 0018:ffffc32703137a68 EFLAGS: 00010286
>>>> [  126.971423] RAX: 0000000000001000 RBX: 0000000000000000 RCX: ffffa08542daf000
>>>> [  126.974807] RDX: 00000000f34df06c RSI: 0000000000000000 RDI: 0000000000001000
>>>> [  126.978224] RBP: 0000000000000000 R08: ffffa085363bb000 R09: 0000000000001000
>>>> [  126.982701] R10: ffffeef9c0006f48 R11: ffffa0853ffd60c0 R12: 000000000000cb35
>>>> [  126.986327] R13: ffffa0853628a060 R14: ffffa08534f195d0 R15: ffffa0851e213358
>>>> [  126.989769] FS:  00007fab74973740(0000) GS:ffffa0853be00000(0000)
>>>> knlGS:0000000000000000
>>>> [  126.993803] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  126.996953] CR2: ffffa085363bb010 CR3: 0000000074fd0002 CR4: 00000000001706f0
>>>> [  127.000593] Call Trace:
>>>> [  127.001907]  rpcrdma_marshal_req+0x4b9/0xb30 [rpcrdma]
>>>> [  127.004789]  ? lock_timer_base+0x67/0x80
>>>> [  127.006710]  xprt_rdma_send_request+0x48/0xd0 [rpcrdma]
>>>> [  127.009257]  xprt_transmit+0x130/0x3f0 [sunrpc]
>>>> [  127.011499]  ? rpc_clnt_swap_deactivate+0x30/0x30 [sunrpc]
>>>> [  127.014225]  ?
>>>> rpc_wake_up_task_on_wq_queue_action_locked+0x230/0x230 [sunrpc]
>>>> [  127.017848]  call_transmit+0x63/0x70 [sunrpc]
>>>> [  127.019973]  __rpc_execute+0x75/0x3e0 [sunrpc]
>>>> [  127.022135]  ? xprt_iter_get_helper+0x17/0x30 [sunrpc]
>>>> [  127.024793]  rpc_run_task+0x153/0x170 [sunrpc]
>>>> [  127.027098]  nfs4_call_sync_custom+0xb/0x30 [nfsv4]
>>>> [  127.029617]  nfs4_do_call_sync+0x69/0x90 [nfsv4]
>>>> [  127.032001]  _nfs42_proc_listxattrs+0x143/0x200 [nfsv4]
>>>> [  127.034766]  nfs42_proc_listxattrs+0x8e/0xc0 [nfsv4]
>>>> [  127.037160]  nfs4_listxattr+0x1b8/0x210 [nfsv4]
>>>> [  127.039454]  ? __check_object_size+0x162/0x180
>>>> [  127.041606]  listxattr+0xd1/0xf0
>>>> [  127.043163]  path_listxattr+0x5f/0xb0
>>>> [  127.044969]  do_syscall_64+0x33/0x40
>>>> [  127.047200]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [  127.049644] RIP: 0033:0x7fab74296c8b
>>>> [  127.051440] Code: f0 ff ff 73 01 c3 48 8b 0d fa 21 2c 00 f7 d8 64
>>>> 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 c2 00 00
>>>> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cd 21 2c 00 f7 d8 64 89
>>>> 01 48
>>>> [  127.060978] RSP: 002b:00007fffcddc4a38 EFLAGS: 00000202 ORIG_RAX:
>>>> 00000000000000c2
>>>> [  127.064848] RAX: ffffffffffffffda RBX: 000000000000002a RCX: 00007fab74296c8b
>>>> [  127.068244] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000674440
>>>> [  127.071642] RBP: 00000000000001f4 R08: 0000000000000000 R09: 00007fffcddc4687
>>>> [  127.075214] R10: 0000000000000004 R11: 0000000000000202 R12: 000000000000002a
>>>> [  127.078667] R13: 0000000000403e60 R14: 0000000000000000 R15: 0000000000000000
>>>> [  127.082783] Modules linked in: cts rpcsec_gss_krb5 nfsv4
>>>> dns_resolver nfs lockd grace nfs_ssc rpcrdma rdma_rxe ip6_udp_tunnel
>>>> udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core nls_utf8
>>>> isofs fuse rfcomm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
>>>> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tun bridge stp llc
>>>> ip6_tables nft_compat ip_set nf_tables nfnetlink bnep
>>>> vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event
>>>> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul
>>>> vmw_balloon ghash_clmulni_intel joydev btusb btrtl pcspkr btbcm
>>>> btintel uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
>>>> videobuf2_common videodev snd_ens1371 bluetooth snd_ac97_codec
>>>> ac97_bus rfkill mc snd_seq snd_pcm ecdh_generic ecc snd_timer
>>>> snd_rawmidi snd_seq_device snd soundcore vmw_vmci i2c_piix4
>>>> auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic
>>>> vmwgfx drm_kms_helper nvme crc32c_intel serio_raw
>>>> [  127.082841]  syscopyarea sysfillrect sysimgblt fb_sys_fops
>>>> nvme_core t10_pi cec vmxnet3 ata_piix ahci libahci ttm libata drm
>>>> [  127.132635] CR2: ffffa085363bb010
>>>> [  127.134527] ---[ end trace 912ce02a00d98fdf ]---
>>> 
>>> --
>>> Chuck Lever

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux