Re: [PATCH] SUNRPC: Change WARN_ON to an warn message in rpc_exit_task

Kinglong Mee <kinglongmee@xxxxxxxxx> · Tue, 18 Aug 2015 14:25:43 +0800

On 8/17/2015 21:35, Scott Mayhew wrote:
> On Mon, 17 Aug 2015, Kinglong Mee wrote:
> 
>> When testing nfs4.0 lock, I get WARN_ON message as below,
>> [  278.733391] ------------[ cut here ]------------
>> [  278.734496] WARNING: CPU: 0 PID: 263 at /root/linux-nfs/net/sunrpc/sched.c:692
>> rpc_exit_task+0x85/0x90 [sunrpc]()
>> [  278.736771] Modules linked in: nfsv4(OE) nfs(OE) nfsd(E) bnep bluetooth ip6t_rpfilter ip6t_REJECT
>> xt_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
>> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
>> ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
>> nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw btrfs ppdev xfs
>> microcode xor serio_raw libcrc32c pcspkr raid6_pq e1000 parport_pc parport i2c_piix4 i2c_core
>> auth_rpcgss nfs_acl lockd sunrpc(OE) ata_generic pata_acpi [last unloaded: nfsd]
>> [  278.751462] CPU: 0 PID: 263 Comm: kworker/0:4 Tainted: G           OE 3.16.0-rc2+ #28
>> [  278.753364] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform,
>> BIOS 6.00 07/31/2013
>> [  278.755710] Workqueue: rpciod rpc_async_schedule [sunrpc]
>> [  278.757112]  0000000000000000 00000000f6173352 ffff880072e5fd30 ffffffff816eacfa
>> [  278.758811]  0000000000000000 ffff880072e5fd68 ffffffff8108bf3d ffff88005997f180
>> [  278.760527]  ffff88003ae44830 ffffffffa001a860 ffffffffa001a860 0000000000000000
>> [  278.762271] Call Trace:
>> [  278.762802]  [<ffffffff816eacfa>] dump_stack+0x45/0x56
>> [  278.763906]  [<ffffffff8108bf3d>] warn_slowpath_common+0x7d/0xa0
>> [  278.765419]  [<ffffffffa001a860>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
>> [  278.767243]  [<ffffffffa001a860>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
>> [  278.768758]  [<ffffffff8108c06a>] warn_slowpath_null+0x1a/0x20
>> [  278.770080]  [<ffffffffa001a8e5>] rpc_exit_task+0x85/0x90 [sunrpc]
>> [  278.771439]  [<ffffffffa001b204>] __rpc_execute+0x94/0x420 [sunrpc]
>> [  278.772829]  [<ffffffffa001b5b6>] rpc_async_schedule+0x26/0x30 [sunrpc]
>> [  278.774276]  [<ffffffff810a7f95>] process_one_work+0x175/0x430
>> [  278.775573]  [<ffffffff810a866d>] worker_thread+0x11d/0x530
>> [  278.776824]  [<ffffffff810a8550>] ? rescuer_thread+0x300/0x300
>> [  278.778069]  [<ffffffff810aee18>] kthread+0xd8/0xf0
>> [  278.779114]  [<ffffffff810aed40>] ? insert_kthread_work+0x40/0x40
>> [  278.780448]  [<ffffffff816f22bc>] ret_from_fork+0x7c/0xb0
>> [  278.781621]  [<ffffffff810aed40>] ? insert_kthread_work+0x40/0x40
>> [  278.782972] ---[ end trace 3350394592832e42 ]---
>>
>> task1 (lock/unlock):
>>   --> call_decode
>>     --> rpc_exit_task (-10008)
>>       --> nfs4_release_lockowner_done
>>         --> rpc_restart_call_prepare
>>             task->tk_action = call_start;
>> task2 (umount):
>>       --> rpc_shutdown_client
>>         --> rpc_killall_tasks
>>             rovr->tk_flags |= RPC_TASK_KILLED; (set task1 to KILLED)
>> task1:
>>     --> rpc_exit_task
>>         WARN_ON(RPC_ASSASSINATED(task));
>>
>> Signed-off-by: Kinglong Mee <kinglongmee@xxxxxxxxx>
>> ---
>>  net/sunrpc/sched.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>> index 337ca85..562db0a 100644
>> --- a/net/sunrpc/sched.c
>> +++ b/net/sunrpc/sched.c
>> @@ -689,7 +689,14 @@ void rpc_exit_task(struct rpc_task *task)
>>  	if (task->tk_ops->rpc_call_done != NULL) {
>>  		task->tk_ops->rpc_call_done(task, task->tk_calldata);
>>  		if (task->tk_action != NULL) {
>> -			WARN_ON(RPC_ASSASSINATED(task));
>> +			/*
>> +			 * A known case here is,
>> +			 * The RPC_TASK_KILLED is set in rpc_killall_tasks by
>> +			 * another process.
>> +			 */
>> +			if (RPC_ASSASSINATED(task))
>> +				printk(KERN_WARNING "RPC: dead task tried to"
>> +					" walk away.\n");
>>  			/* Always release the RPC slot and buffer memory */
>>  			xprt_release(task);
>>  			rpc_reset_task_statistics(task);
>> -- 
>> 2.4.3
>>
> 
> If this is going to be changed anyway, could we maybe have some more
> useful debugging information added to the message?  Something like:
> 
>         printk(KERN_WARNING "RPC: %s to %s was killed\n", 
>                 task->tk_msg.rpc_proc->p_name, 
>                 rpc_peeraddr2str(task->tk_client, RPC_DISPLAY_ADDR));
> 
> I recently had a case where this warning was popping several times per
> minute.  The machine was both a client and a server... it turns out
> that it was actually CB_NULLs that were causing the warning, and the
> trigger for that was that several clients had the same value for the
> clientaddr mount option.
> 
> Having some more useful info from the warning message would have saved
> some time there.

If it's harmless of this case, remove the WARN_ON directly maybe better.
What's your opinion, Trond? 

thanks,
Kinglong Mee
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html