Re: [2.6.31] NFS4ERR_GRACE unhandled...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Trond,

On Mon, Sep 28, 2009 at 7:16 PM, Trond Myklebust
<Trond.Myklebust@xxxxxxxxxx> wrote:
> On Sat, 2009-09-26 at 19:14 +0100, Daniel J Blueman wrote:
>> Hi Trond,
>>
>> After rebooting my 2.6.31 NFS4 server, I see a list of NFS kernel
>> errors [1] on the 2.6.31 client corresponding to NFS4ERR_GRACE, so
>> lock or file state recovery failed. Is this expected noting that I
>> have an internal firewall allowing incoming TCP port 2049 on the
>> server, and no firewall on the client, however I can't see how it can
>> thus be callback related?
>
> No. It looks as if your server rebooted while the client was recovering
> an expired lease.
>
> The following patch should prevent future occurrences of this bug...
>
> Cheers
>  Trond
> ------------------------------------------------------------------
> NFSv4: Handle NFS4ERR_GRACE when recovering an expired lease.
>
> From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
>
> If our lease expires, and the server subsequently reboot, we need to be
> able to handle the case where the server refuses to let us recover state,
> because it is in the grace period.

On the client, I didn't see the error messages with this patch,
however I did see firefox (via sqlite) continue to hang [1] (after
other processes continued), and an unusual level of activity with
rpciod/0 and rpciod/1 kernel threads. Other NFS-related kernel thread
state is given.

I'm able to reproduce [2] this behaviour with 2.6.31-rc1 and your
patch; let me know if you're unable to.

Thanks,
  Daniel

--- [1]

# echo t >/proc/sysrq-trigger
# dmesg
<aargh, missing rpciod/0 and rpciod/1 which are in a unexpected state,
since CONFIG_LOG_BUF_SHIFT is only 18 by default and crashdump doesn't
work without the vmcoreinfo file>
nfsv4.0-svc   S ffff880126e3ed70  4600  2628      2 0x00000000
 ffff88011fe25d20 0000000000000046 000000000000031a ffff88011fd5a5d8
 ffff88011fd5a5a0 ffffffff825cb770 ffff88011fd5af68 ffffffff00000320
 0000000000000001 ffffffff8220e0a0 ffff88011fe24000 0000000000017768
Call Trace:
 [<ffffffff8112e7ea>] ? kfree+0xea/0x1c0
 [<ffffffff816394e5>] schedule_timeout+0x275/0x360
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff81067158>] ? local_bh_enable_ip+0x98/0x130
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff815ff433>] ? svc_recv+0x793/0x8a0
 [<ffffffff81095b1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff815ff43c>] svc_recv+0x79c/0x8a0
 [<ffffffff81054c00>] ? default_wake_function+0x0/0x10
 [<ffffffff8121b803>] nfs4_callback_svc+0x43/0xb0
 [<ffffffff8121b7c0>] ? nfs4_callback_svc+0x0/0xb0
 [<ffffffff8107eb3e>] kthread+0x8e/0xa0
 [<ffffffff8100d15a>] child_rip+0xa/0x20
 [<ffffffff8100ca94>] ? restore_args+0x0/0x30
 [<ffffffff8107eab0>] ? kthread+0x0/0xa0
 [<ffffffff8100d150>] ? child_rip+0x0/0x20
firefox       S 0000000000000000  2512  2695      1 0x00000000
 ffff880117229c38 0000000000000046 0000000000000000 00000055b199802a
 ffff8801171daf30 ffffffff821f0210 0000000000019400 ffff88002efe0bd8
 0000000000000000 00000000ffffe926 ffff880117228000 0000000000017768
Call Trace:
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8107ef60>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff8163b30e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S 0000000000000000  4824  2698      1 0x00000000
 ffff880129d71a58 0000000000000046 0000000000000000 ffffffff81096872
 0000000000000002 ffffffff82204790 0000000000000042 ffff88002efe0bd8
 0000000000000000 00000000ffffe731 ffff880129d70000 0000000000017768
Call Trace:
 [<ffffffff81096872>] ? __lock_acquire+0x482/0x1440
 [<ffffffff8163a445>] schedule_hrtimeout_range+0x145/0x170
 [<ffffffff8163bf35>] ? _spin_unlock_irqrestore+0x45/0x80
 [<ffffffff8107f2a1>] ? add_wait_queue+0x41/0x50
 [<ffffffff8114dbdf>] ? __pollwait+0x6f/0xf0
 [<ffffffff8114da94>] poll_schedule_timeout+0x34/0x50
 [<ffffffff8114e89f>] do_sys_poll+0x40f/0x520
 [<ffffffff815536a2>] ? dev_queue_xmit+0x52/0x520
 [<ffffffff8114db70>] ? __pollwait+0x0/0xf0
 [<ffffffff8114dc60>] ? pollwake+0x0/0x60
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff810925de>] ? put_lock_stats+0xe/0x30
 [<ffffffff8163bf55>] ? _spin_unlock_irqrestore+0x65/0x80
 [<ffffffff810645d2>] ? current_fs_time+0x22/0x30
 [<ffffffff81157118>] ? mnt_drop_write+0x38/0x60
 [<ffffffff81152bf5>] ? touch_atime+0x135/0x180
 [<ffffffff81145925>] ? pipe_read+0x2d5/0x4e0
 [<ffffffff8113ce12>] ? do_sync_read+0xf2/0x130
 [<ffffffff8107ef60>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8100bffa>] ? sysret_check+0x2e/0x69
 [<ffffffff8114eb97>] sys_poll+0x77/0x110
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S ffff880126c28000  5944  2699      1 0x00000000
 ffff88011729bc38 0000000000000046 ffff88011729bb88 ffffffff8163bf35
 ffff88011729bd28 0000024240c36955 ffff88011729bc28 ffffffff81082b30
 ffff88011729bbb8 ffffffff8109044a ffff88011729a000 0000000000017768
Call Trace:
 [<ffffffff8163bf35>] ? _spin_unlock_irqrestore+0x45/0x80
 [<ffffffff81082b30>] ? __hrtimer_start_range_ns+0x170/0x480
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff81082380>] ? hrtimer_wakeup+0x0/0x30
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8163bf9b>] ? _spin_unlock_irq+0x2b/0x60
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff81095b1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8163bfac>] ? _spin_unlock_irq+0x3c/0x60
 [<ffffffff81049baf>] ? finish_task_switch+0x6f/0xf0
 [<ffffffff81049b40>] ? finish_task_switch+0x0/0xf0
 [<ffffffff816388bb>] ? thread_return+0x98/0x82d
 [<ffffffff8163b30e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095a01>] ? trace_hardirqs_on_caller+0x91/0x1a0
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S 0000000000000000  5944  2700      1 0x00000000
 ffff8801170b9c38 0000000000000046 0000000000000000 ffffffff8163bf35
 ffff8801170b9d28 0000035df665233f ffff8801170b9c28 ffff88002efe0bd8
 0000000000000000 0000000100016139 ffff8801170b8000 0000000000017768
Call Trace:
 [<ffffffff8163bf35>] ? _spin_unlock_irqrestore+0x45/0x80
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff81082380>] ? hrtimer_wakeup+0x0/0x30
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8107ef60>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095a01>] ? trace_hardirqs_on_caller+0x91/0x1a0
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       D 0000000000000002  3832  2710      1 0x00000000
 ffff880117283b58 0000000000000046 ffff88011fd72fa0 ffffffff821e0250
 0000000000019400 ffff88011fd725a0 0000000000000001 0000000000000007
 0000000000000006 0000000000000002 ffff880117282000 0000000000017768
Call Trace:
 [<ffffffff81095b1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff810ef7e0>] ? sync_page+0x0/0x60
 [<ffffffff81639092>] io_schedule+0x42/0x60
 [<ffffffff810ef825>] sync_page+0x45/0x60
 [<ffffffff816397c7>] __wait_on_bit+0x57/0x80
 [<ffffffff810f1e00>] ? find_get_pages_tag+0x130/0x1d0
 [<ffffffff810f1cd0>] ? find_get_pages_tag+0x0/0x1d0
 [<ffffffff810ef95e>] wait_on_page_bit+0x6e/0x80
 [<ffffffff8107efa0>] ? wake_bit_function+0x0/0x40
 [<ffffffff810fb150>] ? pagevec_lookup_tag+0x20/0x30
 [<ffffffff810efad4>] wait_on_page_writeback_range+0xf4/0x1a0
 [<ffffffff81096872>] ? __lock_acquire+0x482/0x1440
 [<ffffffff810fa1bf>] ? do_writepages+0x1f/0x40
 [<ffffffff810efc2b>] ? __filemap_fdatawrite_range+0x5b/0x60
 [<ffffffff810efba6>] filemap_fdatawait+0x26/0x30
 [<ffffffff810efed4>] filemap_write_and_wait+0x44/0x50
 [<ffffffff811f2794>] nfs_setattr+0x174/0x190
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff810925de>] ? put_lock_stats+0xe/0x30
 [<ffffffff81154d12>] ? notify_change+0x1b2/0x2e0
 [<ffffffff81154c64>] notify_change+0x104/0x2e0
 [<ffffffff8113bf91>] do_truncate+0x61/0x90
 [<ffffffff8113c620>] sys_ftruncate+0xd0/0xe0
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S 0000000000000000  5048  2713      1 0x00000000
 ffff88011727fc38 0000000000000046 0000000000000000 000000209bf0709a
 ffff88011febd4d0 ffffffff821f0210 0000000000019400 ffff88002efe0bd8
 0000000000000000 00000000ffffbe3c ffff88011727e000 0000000000017768
Call Trace:
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8153d795>] ? sockfd_lookup_light+0x25/0x80
 [<ffffffff8107ef60>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff8163b30e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S ffff880126c2a5a0  6216  2714      1 0x00000000
 ffff880117275c38 0000000000000046 ffff880117275b88 00000012ae79dbd1
 ffff880126c2af30 ffffffff821f0210 0000000000019400 ffff880126c2a5a0
 ffff880117275bb8 ffffffff8109044a ffff880117274000 0000000000017768
Call Trace:
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff812d4f7e>] ? _raw_spin_unlock+0x5e/0xb0
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff810925de>] ? put_lock_stats+0xe/0x30
 [<ffffffff8163bf9b>] ? _spin_unlock_irq+0x2b/0x60
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff81095b1d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff8163b30e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
firefox       S 0000000000000000  2864  2715      1 0x00000000
 ffff880117297c38 0000000000000046 0000000000000000 000000209b22fd4e
 ffff8801347ad4d0 ffffffff821f0210 0000000000019400 ffff88002efe0bd8
 0000000000000000 00000000ffffbe3c ffff880117296000 0000000000017768
Call Trace:
 [<ffffffff8109aa0d>] futex_wait_queue_me+0xbd/0x100
 [<ffffffff8109b068>] futex_wait+0x168/0x2f0
 [<ffffffff8109caa9>] do_futex+0x109/0xac0
 [<ffffffff8107ef60>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8109044a>] ? get_lock_stats+0x2a/0x60
 [<ffffffff8109d526>] sys_futex+0xc6/0x170
 [<ffffffff81095abd>] ? trace_hardirqs_on_caller+0x14d/0x1a0
 [<ffffffff8163b30e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
kthreadd      D 00000000ffffffff  6816 23477      2 0x00000000
 ffff8801344b9eb0 0000000000000046 ffff8801344b9e78 ffff8801344b9e74
 ffff8801344b9e00 ffff88002ee17cc0 ffff88002f1e0bc0 ffff88002efe0bd8
 0000000000000000 00000001000354b3 ffff8801344b8000 0000000000017768
Call Trace:
 [<ffffffff812181f0>] ? nfs4_run_state_manager+0x0/0x4c0
 [<ffffffff8107eb22>] kthread+0x72/0xa0
 [<ffffffff8100d15a>] child_rip+0xa/0x20
 [<ffffffff8100ca94>] ? restore_args+0x0/0x30
 [<ffffffff8107eab0>] ? kthread+0x0/0xa0
 [<ffffffff8100d150>] ? child_rip+0x0/0x20

--- [2]

1. ensure NFS4 client processes are updating databases/files (eg run
firefox or renderer)
2. reboot server (perhaps unneeded)
3. on server, stop nfs-kernel service for some minutes
4. start nfs-kernel service again
5. verify client application resume without dataloss
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux