Re: [4.13-rc1 regression] fstests generic/013 crashed nfsd on ppc64 host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2017-07-17 at 20:08 +0800, Eryu Guan wrote:
> Hi all,
> 
> I hit a nfsd crash in fstests generic/013 run with 4.13-rc1 kernel,
> NFS
> version 4.0/4.1/4.2, v3 passed the test, and it only happens on
> ppc64/ppc64le hosts for me. git bisect pointed first bad to
> 
> commit 1c5876ddbdb401f814ef717394826e7dfb6704d4
> Author: Christoph Hellwig <hch@xxxxxx>
> Date:   Mon May 8 23:27:10 2017 +0200
> 
>     sunrpc: move p_count out of struct rpc_procinfo
> 
>     p_count is the only writeable memeber of struct rpc_procinfo,
> which is
>     a good candidate to be const-ified as it contains function
> pointers.
> 
>     This patch moves it into out out struct rpc_procinfo, and into a
>     separate writable array that is pointed to by struct rpc_version
> and
>     indexed by p_statidx.
> 
>     Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> 
> I was testing with a local mounted NFS share, but I can also
> reproduce
> it by running generic/013 from a remote nfs client. If you need more
> information please let me know.
> 
> Thanks,
> Eryu
> 
> [  992.581712] run fstests generic/013 at 2017-07-16 07:30:42 
> [  993.895088] Unable to handle kernel paging request for data at
> address 0x2f7362696e2f6e76 
> [  993.895113] Faulting instruction address: 0xd000000006660428 
> [  993.895121] Oops: Kernel access of bad area, sig: 11 [#1] 
> [  993.895126] SMP NR_CPUS=2048  
> [  993.895127] NUMA  
> [  993.895130] pSeries 
> [  993.895137] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver
> nfs fscache ext4 mbcache jbd2 nx_crypto sg pseries_rng nfsd
> auth_rpcgss nfs_acl lockd sunrpc grace ip_tables xfs libcrc32c sd_mod
> ibmvscsi scsi_transport_srp ibmveth 
> [  993.895168] CPU: 11 PID: 335 Comm: kworker/11:1 Not tainted
> 4.13.0-rc1 #1 
> [  993.895197] Workqueue: rpciod .rpc_async_schedule [sunrpc] 
> [  993.895203] task: c0000001f94cf780 task.stack: c0000001f952c000 
> [  993.895208] NIP: d000000006660428 LR: d0000000066748d4 CTR:
> d0000000066603d0 
> [  993.895214] REGS: c0000001f952f7e0 TRAP: 0380   Not
> tainted  (4.13.0-rc1) 
> [  993.895219] MSR: 800000000280b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI> 
> [  993.895225]   CR: 22004024  XER: 00000001 
> [  993.895233] CFAR: d0000000066748d0 SOFTE: 1  
> [  993.895233] GPR00: d0000000066748d4 c0000001f952fa60
> d0000000066b5d78 c0000001bcee7d00  
> [  993.895233] GPR04: c0000000fefc19e8 c0000001bcee7d48
> 002d1e7473db58e8 0000000000000001  
> [  993.895233] GPR08: d0000000079dd588 2f7362696e2f6e66
> 0000000000000008 d0000000079d45f8  
> [  993.895233] GPR12: d000000006660010 c00000000e986e00
> c000000000110ab0 c0000001f81d0040  
> [  993.895233] GPR16: 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000  
> [  993.895233] GPR20: 0000000000000000 fffffffffffffe00
> 0000000000000000 0000000000000001  
> [  993.895233] GPR24: d0000000066b9f34 c0000001bcee7d30
> 0000000000000000 d0000000066aac68  
> [  993.895233] GPR28: c0000001bc79cc00 0000000000000001
> c0000001bc79cc00 c0000001bcee7d00  
> [  993.895313] NIP [d000000006660428] .call_start+0x58/0x120
> [sunrpc] 
> [  993.895337] LR [d0000000066748d4] .__rpc_execute+0xc4/0x540
> [sunrpc] 
> [  993.895342] Call Trace: 
> [  993.895346] [c0000001f952fa60] [0000000000000001] 0x1
> (unreliable) 
> [  993.895370] [c0000001f952faf0] [d0000000066748d4]
> .__rpc_execute+0xc4/0x540 [sunrpc] 
> [  993.895379] [c0000001f952fbe0] [c000000000108e74]
> .process_one_work+0x194/0x480 
> [  993.895387] [c0000001f952fc90] [c0000000001091e8]
> .worker_thread+0x88/0x510 
> [  993.895393] [c0000001f952fd70] [c000000000110c0c]
> .kthread+0x15c/0x1a0 
> [  993.895401] [c0000001f952fe30] [c00000000000b520]
> .ret_from_kernel_thread+0x58/0xb8 
> [  993.895407] Instruction dump: 
> [  993.895411] e9430078 ebc300a8 7928ffe3 ebaa0026 40c2006c e95e0180
> 80fe0044 e90a0010  
> [  993.895421] 78ea1f24 7d28502a 2fa90000 419e0018 <e9490010>
> 7ba91764 7d0a482e 39080001  
> [  993.895433] ---[ end trace aeee2c84dc1574c0 ]--- 
> 
> And gdb shows:
> 
> (gdb) l *(call_start+0x60)
> 0x4b0 is in call_start (net/sunrpc/clnt.c:1529).
> 1524                            rpc_proc_name(task),
> 1525                            (RPC_IS_ASYNC(task) ? "async" :
> "sync"));
> 1526
> 1527            /* Increment call count (version might not be valid
> for ping) */
> 1528            if (clnt->cl_program->version[clnt->cl_vers])
> 1529                    clnt->cl_program->version[clnt->cl_vers]-
> >counts[idx]++;
> 1530            clnt->cl_stats->rpccnt++;
> 1531            task->tk_action = call_reserve;
> 1532    }
> 1533
> 

Please see the patch that I posted yesterday in response to Dave Jones'
report of the same issue.

Bruce, do you want me to resend?

Thanks
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux