RE: Kernel panic over qedr with

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Kalderon, Michal
> Sent: Wednesday, January 24, 2018 3:10 PM
> To: Chuck Lever <chuck.lever@xxxxxxxxxx>
> Cc: linux-rdma@xxxxxxxxxxxxxxx
> Subject: RE: Kernel panic over qedr with
> 
> [This sender failed our fraud detection checks and may not be who they
> appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
> 
> > From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx]
> > Sent: Tuesday, January 23, 2018 7:14 PM
> > > On Jan 23, 2018, at 9:05 AM, Kalderon, Michal
> > <Michal.Kalderon@xxxxxxxxxx> wrote:
> > >
> > > Hi Chuck,
> > >
> > > I’m hitting issues with NFS over qedr (either iWARP / RoCE)
> > >
> > > I get a kernel panic during mount.
> > > Looks like it started with 4.15-rcx
> > > Looking at qedr code it looks like the wr we get may be corrupted/
> > > Below the stack trace and following logs with RCP debug enabled…
> > >
> > > Can you please advise on how to proceed with debugging ?
> >
> > I don't see anything distinctive in the log output.
> > I would start with a git bisect if your reproducer hits the oops every time.
> 
> On it, thanks.
Hi Chuck, 

Git bisect helped in finding this commit as the one introducing the oops: 

01bb35c89d90abe6fd1c0be001f84bbdfa7fa7d1 is the first bad commit
commit 01bb35c89d90abe6fd1c0be001f84bbdfa7fa7d1
Author: Chuck Lever <chuck.lever@xxxxxxxxxx>
Date:   Fri Oct 20 10:48:36 2017 -0400

    xprtrdma: RPC completion should wait for Send completion


I can't see the issue in the patch, but am lacking some more knowledge in implementation. 
Not sure why or whether it helps, but we see the oops consistently with iWARP. And intermittently with RoCE. 
Can you please take a look ? 


Thanks,
Michal

> >
> >
> > > Thanks,
> > > Michal
> > >
> > > [  782.951762] BUG: unable to handle kernel NULL pointer dereference
> > > at 0000000000000040 [  782.952286] IP: __qedr_post_send+0x85/0x127c
> > > [qedr] [  782.952797] PGD 0 P4D 0 [  782.953293] Oops: 0000 [#1] SMP
> > > PTI [  782.953781] Modules linked in: qedr(E) qede qed
> > > rpcsec_gss_krb5
> > > nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle
> > > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4
> > nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
> > libcrc32c ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter
> > ebtables ip6table_filter ip6_tables iptable_filter bonding rpcrdma
> > ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp
> > scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
> rdma_cm
> > ib_cm iw_cm ib_core intel_powerclamp coretemp kvm irqbypass
> > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
> > crypto_simd glue_helper cryptd iTCO_wdt gpio_ich iTCO_vendor_support
> > ipmi_si pcspkr sg lpc_ich
> > i2c_i801 ipmi_devintf ipmi_msghandler ioatdma i7core_edac shpchp [
> > 782.957637]  acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> > ip_tables ext4 mbcache jbd2 sr_mod sd_mod cdrom ata_generic pata_acpi
> > mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> > ttm drm igb ata_piix ptp libata pps_core e1000 crc32c_intel dca
> > i2c_algo_bit i2c_core
> > > [  782.959572] CPU: 7 PID: 3422 Comm: mount.nfs Tainted: G            E
> > 4.15.0-rc9+ #2
> > > [  782.960232] Hardware name: Intel Corporation S5520HC/S5520HC,
> > > BIOS
> > > S5500.86B.01.00.0059.082320111421 08/23/2011 [  782.960901] RIP:
> > > 0010:__qedr_post_send+0x85/0x127c [qedr] [  782.961575] RSP:
> > > 0018:ffffab3d8259b6f8 EFLAGS: 00010082 [  782.962244] RAX:
> > > 0000000000000025 RBX: 0000000000000028 RCX: 0000000000000006 [
> > > 782.962907] RDX: 0000000000000000 RSI: 0000000000000092 RDI:
> > > ffff9b275fd968f0 [  782.963573] RBP: ffffab3d8259b780 R08:
> > > 00000000000004cc R09: 0000000000000000 [  782.964242] R10:
> > > ffff9b2753218540 R11: ffff9b25d9d67400 R12: ffff9b25d60aa000 [
> > > 782.964894] R13: 0000000000000028 R14: 0000000000000000 R15:
> > > ffff9b259400e400 [  782.965557] FS:  00007f91dad3a880(0000)
> > > GS:ffff9b275fd80000(0000) knlGS:0000000000000000 [  782.966235] CS:
> > > 0010
> > DS: 0000 ES: 0000 CR0: 0000000080050033 [  782.966906] CR2:
> > 0000000000000040 CR3: 0000000393dbc003 CR4: 00000000000206e0 [
> > 782.967593] Call Trace:
> > > [  782.968271]  qedr_post_send+0x12e/0x180 [qedr] [  782.968950]
> > > rpcrdma_ep_post+0x83/0xf0 [rpcrdma] [  782.969637]
> > > xprt_rdma_send_request+0x84/0xe0 [rpcrdma] [  782.970349]
> > > xprt_transmit+0x6c/0x370 [sunrpc] [  782.971057]  ?
> > > call_decode+0x820/0x820 [sunrpc] [  782.971759]  ?
> > > call_decode+0x820/0x820 [sunrpc] [  782.972464]
> > > call_transmit+0x194/0x280 [sunrpc] [  782.973174]
> > > __rpc_execute+0x7e/0x3f0 [sunrpc] [  782.973867]
> > > rpc_run_task+0x106/0x150 [sunrpc] [  782.974577]
> > > nfs4_proc_setclientid+0x213/0x380 [nfsv4] [  782.975279]
> > > nfs40_discover_server_trunking+0x80/0xe0 [nfsv4] [  782.975982]
> > > nfs4_discover_server_trunking+0x78/0x2b0 [nfsv4] [  782.976688]
> > > nfs4_init_client+0x11b/0x260 [nfsv4] [  782.977402]  ?
> > > __rpc_init_priority_wait_queue+0x83/0xb0 [sunrpc] [  782.978129]  ?
> > > nfs4_alloc_client+0x15f/0x200 [nfsv4] [  782.978853]  ?
> > > nfs_get_client+0x2c1/0x360 [nfs] [  782.979583]  ?
> > > pcpu_alloc_area+0xc0/0x130 [  782.980317]  nfs4_set_client+0x9d/0xe0
> > > [nfsv4]
> > >
> > >
> > > RCP Logs
> > > [  782.948548] RPC:   209 xmit complete
> > > [  782.948549] RPC:   209 sleep_on(queue "xprt_pending" time
> 4295450237)
> > > [  782.948550] RPC:   209 added to queue 000000006f382a4e
> "xprt_pending"
> > > [  782.948550] RPC:   209 setting alarm for 60000 ms
> > > [  782.948552] RPC:   209 sync task going to sleep
> > > [  782.948808] RPC:       rpcrdma_wc_receive: rep 0000000067b47231
> opcode
> > 'recv', length 52: success
> > > [  782.948811] RPC:       rpcrdma_reply_handler: incoming rep
> > 0000000067b47231
> > > [  782.948814] RPC:       rpcrdma_reply_handler: reply 0000000067b47231
> > completes request 0000000096e058c2 (xid 0x4e366e3c)
> > > [  782.948869] RPC:       rpcrdma_inline_fixup: srcp 0x00000000a5e2c17f len
> > 24 hdrlen 24
> > > [  782.948871] RPC:       wake_up_first(0000000004dd7309 "xprt_sending")
> > > [  782.948874] RPC:   209 xid 4e366e3c complete (24 bytes received)
> > > [  782.948875] RPC:   209 __rpc_wake_up_task (now 4295450237)
> > > [  782.948876] RPC:   209 disabling timer
> > > [  782.948877] RPC:   209 removed from queue 000000006f382a4e
> > "xprt_pending"
> > > [  782.948881] RPC:       __rpc_wake_up_task done
> > > [  782.948916] RPC:   209 sync task resuming
> > > [  782.948918] RPC:   209 call_status (status 24)
> > > [  782.948919] RPC:   209 call_decode (status 24)
> > > [  782.948921] RPC:   209 validating NULL cred 000000000941bc29
> > > [  782.948923] RPC:   209 using AUTH_NULL cred 000000000941bc29 to
> > unwrap rpc data
> > > [  782.948925] RPC:   209 call_decode result 0
> > > [  782.948926] RPC:   209 return 0, status 0
> > > [  782.948927] RPC:   209 release task
> > > [  782.948929] RPC:       xprt_rdma_free: called on 0x0000000067b47231
> > > [  782.948963] RPC:   209 release request 00000000088290f9
> > > [  782.948964] RPC:       wake_up_first(0000000066fcc180 "xprt_backlog")
> > > [  782.948966] RPC:       rpc_release_client(00000000d13a4ef8)
> > > [  782.948968] RPC:   209 freeing task
> > > [  782.949065] svc: svc_destroy(NFSv4 callback, 2)
> > > [  782.949069] RPC:       new task initialized, procpid 3422
> > > [  782.949070] RPC:       allocated task 00000000407c4b9f
> > > [  782.949071] RPC:   210 __rpc_execute flags=0x5280
> > > [  782.949072] RPC:   210 call_start nfs4 proc SETCLIENTID (sync)
> > > [  782.949073] RPC:   210 call_reserve (status 0)
> > > [  782.949074] RPC:   210 reserved req 00000000088290f9 xid 4f366e3c
> > > [  782.949076] RPC:   210 call_reserveresult (status 0)
> > > [  782.949077] RPC:   210 call_refresh (status 0)
> > > [  782.949078] RPC:   210 refreshing UNIX cred 000000005a6be29a
> > > [  782.949079] RPC:   210 call_refreshresult (status 0)
> > > [  782.949080] RPC:   210 call_allocate (status 0)
> > > [  782.949081] RPC:   210 xprt_rdma_allocate: send size = 1456, recv size =
> > 276, req = 00000000f056c1ce
> > > [  782.949082] RPC:   210 call_bind (status 0)
> > > [  782.949083] RPC:   210 call_connect xprt 00000000e40824fb is connected
> > > [  782.949083] RPC:   210 call_transmit (status 0)
> > > [  782.949084] RPC:   210 xprt_prepare_transmit
> > > [  782.949085] RPC:   210 xprt_cwnd_limited cong = 0 cwnd = 8192
> > > [  782.949085] RPC:   210 rpc_xdr_encode (status 0)
> > > [  782.949086] RPC:   210 marshaling UNIX cred 000000005a6be29a
> > > [  782.949088] RPC:   210 using AUTH_UNIX cred 000000005a6be29a to
> wrap
> > rpc data
> > > [  782.949090] RPC:   210 xprt_transmit(192)
> > > [  782.949091] RPC:   210 rpcrdma_marshal_req: inline/inline: hdrlen 28
> > rpclen
> > > [  782.949093] RPC:       rpcrdma_ep_post: posting 2 s/g entries
> >
> > --
> > Chuck Lever
> >
> >
> 
>   칻 & ~ &   +-  ݶ  w  ˛   m b  kvf   ^n r   z   h    &   G   h ( 階 ݢj"   m     z ޖ
> f   h   ~ m
��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux