Re: [PATCH] sunrpc: Fix infinite loop in RPC state machine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Resending to Trond’s correct email address.

-dros

On Dec 17, 2013, at 11:51 AM, Weston Andros Adamson <dros@xxxxxxxxxx> wrote:

> When a task enters call_refreshresult with status 0 from call_refresh and
> !rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting
> or max number of retries.
> 
> Instead of trying forever, make use of the retry path that other errors use.
> 
> This only seems to be possible when the crrefresh callback is gss_refresh_null,
> which only happens when destroying the context.
> 
> To reproduce:
> 
> 1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific
>   operations).
> 
> 2) reboot - the client will be stuck and will need to be hard rebooted
> 
> BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46]
> Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy
> irq event stamp: 195724
> hardirqs last  enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30
> hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80
> softirqs last  enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276
> softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a
> CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4
> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
> Workqueue: rpciod rpc_async_schedule [sunrpc]
> task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000
> RIP: 0010:[<ffffffffa0064fd4>]  [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc]
> RSP: 0018:ffff880079003d18  EFLAGS: 00000246
> RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007
> RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900
> RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7
> R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e
> R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900
> FS:  0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0
> Stack:
> ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830
> ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260
> ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000
> Call Trace:
> [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1
> [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc]
> [<ffffffff81052974>] process_one_work+0x211/0x3a5
> [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5
> [<ffffffff81052eeb>] worker_thread+0x134/0x202
> [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
> [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
> [<ffffffff810584a0>] kthread+0xc9/0xd1
> [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
> [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0
> [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
> Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00
> 
> And the output of "rpcdebug -m rpc -s all":
> 
> RPC:    61 call_refresh (status 0)
> RPC:    61 call_refresh (status 0)
> RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
> RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
> RPC:    61 call_refreshresult (status 0)
> RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
> RPC:    61 call_refreshresult (status 0)
> RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
> RPC:    61 call_refresh (status 0)
> RPC:    61 call_refreshresult (status 0)
> RPC:    61 call_refresh (status 0)
> RPC:    61 call_refresh (status 0)
> RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
> RPC:    61 call_refreshresult (status 0)
> RPC:    61 call_refresh (status 0)
> RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
> RPC:    61 call_refresh (status 0)
> RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
> RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
> RPC:    61 call_refreshresult (status 0)
> RPC:    61 call_refresh (status 0)
> RPC:    61 call_refresh (status 0)
> RPC:    61 call_refresh (status 0)
> RPC:    61 call_refresh (status 0)
> RPC:    61 call_refreshresult (status 0)
> RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
> Signed-off-by: Weston Andros Adamson <dros@xxxxxxxxxx>
> ---
> net/sunrpc/clnt.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index f09b7db..5f585a7 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1529,9 +1529,14 @@ call_refreshresult(struct rpc_task *task)
> 	task->tk_action = call_refresh;
> 	switch (status) {
> 	case 0:
> -		if (rpcauth_uptodatecred(task))
> +		if (rpcauth_uptodatecred(task)) {
> 			task->tk_action = call_allocate;
> -		return;
> +			return;
> +		}
> +		/*
> +		 * Use rate-limiting and a max number of retries if refresh
> +		 * had status 0 but failed to update the cred.
> +		 */
> 	case -ETIMEDOUT:
> 		rpc_delay(task, 3*HZ);
> 	case -EAGAIN:
> -- 
> 1.8.3.1 (Apple Git-46)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux