Re: nfsroot client will not start firefox or thunderbird from 3.4.0 nfsserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 10 Jun 2012 15:56:11 +0200
Hans de Bruin <jmdebruin@xxxxxxxxx> wrote:

> On 06/10/2012 11:52 AM, Jeff Layton wrote:
> > On Tue, 29 May 2012 00:19:34 +0200
> > Hans de Bruin<jmdebruin@xxxxxxxxx>  wrote:
> >
> >> I just upgraded my home server from kernel 3.3.5 to 3.4.0 and ran into
> >> some trouble. My laptop, a nfsroot client, will not run firefox and
> >> thunderbird anymore. When I start these programs from an xterm, the
> >> cursor goes to the next line and waits indefinitely.
> >>
> >> I do not know if there is any order is lsof's output. A lsof | grep
> >> firefox or thunderbird shows ......./.parentlock as the last line.
> >>
> >> It does not matter whether the client is running a 3.4.0 or a 3.3.0
> >> kernel, or if the server is running on top of xen or not.
> >>
> >> There is some noise in the servers dmesg:
> >>
> >> [  241.256684] INFO: task kworker/u:2:801 blocked for more than 120 seconds.
> >> [  241.256691] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [  241.256698] kworker/u:2     D 000000000000000c     0   801      2
> >> 0x00000000
> >> [  241.256710]  ffff8801390e5cf0 0000000000000046 0000000000012d00
> >> 0000000000012d00
> >> [  241.256721]  0000000000012d00 ffff880139f8bd50 0000000000012d00
> >> ffff8801390e5fd8
> >> [  241.256732]  ffff8801390e5fd8 0000000000012d00 ffff880139ce4420
> >> ffff880139f8bd50
> >> [  241.256743] Call Trace:
> >> [  241.256759]  [<ffffffff8158733e>] schedule+0x64/0x66
> >> [  241.256769]  [<ffffffff8120184e>] cld_pipe_upcall+0x95/0xd1
> >> [  241.256780]  [<ffffffff811fbd8d>] ? nfsd4_exchange_id+0x23e/0x23e
> >> [  241.256789]  [<ffffffff81201d06>] nfsd4_cld_grace_done+0x50/0x8a
> >> [  241.256798]  [<ffffffff81202233>] nfsd4_record_grace_done+0x18/0x1a
> >> [  241.256807]  [<ffffffff811fbdd7>] laundromat_main+0x4a/0x213
> >> [  241.256818]  [<ffffffff8106a06b>] ? need_resched+0x1e/0x28
> >> [  241.256826]  [<ffffffff8158725d>] ? __schedule+0x49d/0x4b5
> >> [  241.256835]  [<ffffffff811fbd8d>] ? nfsd4_exchange_id+0x23e/0x23e
> >> [  241.256844]  [<ffffffff8105be2d>] process_one_work+0x190/0x28d
> >> [  241.256854]  [<ffffffff8105ca67>] worker_thread+0x105/0x189
> >> [  241.256862]  [<ffffffff81587b8d>] ? _raw_spin_unlock_irqrestore+0x1a/0x1d
> >> [  241.256872]  [<ffffffff8105c962>] ? manage_workers.clone.17+0x173/0x173
> >> [  241.256881]  [<ffffffff810604b0>] kthread+0x8a/0x92
> >> [  241.256891]  [<ffffffff815899a4>] kernel_thread_helper+0x4/0x10
> >> [  241.256900]  [<ffffffff81060426>] ?
> >> kthread_freezable_should_stop+0x47/0x47
> >> [  241.256909]  [<ffffffff815899a0>] ? gs_change+0x13/0x13
> >>
> >> or xenified:
> >>
> >>
> >> [  240.563448] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs"
> >> disables this message.
> >> [  240.563458] kworker/u:2     D ffff88007fc12d00     0   808      2
> >> 0x00000000
> >> [  240.563479]  ffff88007532fcf0 0000000000000246 0000000000012d00
> >> 0000000000012d00
> >> [  240.563504]  0000000000012d00 ffff880075f7caf0 0000000000012d00
> >> ffff88007532ffd8
> >> [  240.563530]  ffff88007532ffd8 0000000000012d00 ffffffff81813020
> >> ffff880075f7caf0
> >> [  240.563555] Call Trace:
> >> [  240.563578]  [<ffffffff8158733e>] schedule+0x64/0x66
> >> [  240.563594]  [<ffffffff8120184e>] cld_pipe_upcall+0x95/0xd1
> >> [  240.563611]  [<ffffffff811fbd8d>] ? nfsd4_exchange_id+0x23e/0x23e
> >> [  240.563625]  [<ffffffff81201d06>] nfsd4_cld_grace_done+0x50/0x8a
> >> [  240.563640]  [<ffffffff81202233>] nfsd4_record_grace_done+0x18/0x1a
> >> [  240.563654]  [<ffffffff811fbdd7>] laundromat_main+0x4a/0x213
> >> [  240.563670]  [<ffffffff8100d085>] ? xen_spin_unlock+0x12/0x30
> >> [  240.563685]  [<ffffffff811fbd8d>] ? nfsd4_exchange_id+0x23e/0x23e
> >> [  240.563700]  [<ffffffff8105be2d>] process_one_work+0x190/0x28d
> >> [  240.563714]  [<ffffffff8100d337>] ? xen_spin_lock+0xb/0xd
> >> [  240.563729]  [<ffffffff8105ca67>] worker_thread+0x105/0x189
> >> [  240.563743]  [<ffffffff81587b8d>] ? _raw_spin_unlock_irqrestore+0x1a/0x1d
> >> [  240.563758]  [<ffffffff8105c962>] ? manage_workers.clone.17+0x173/0x173
> >> [  240.563772]  [<ffffffff810604b0>] kthread+0x8a/0x92
> >> [  240.563787]  [<ffffffff815899a4>] kernel_thread_helper+0x4/0x10
> >> [  240.563802]  [<ffffffff81587f38>] ? retint_restore_args+0x5/0x6
> >> [  240.563816]  [<ffffffff815899a0>] ? gs_change+0x13/0x13
> >>
> >>
> >
> > It sounds like you're not running the new nfsdcld daemon on the server,
> > and /var/lib/nfs/v4recovery does not exist. Is that correct?
> >
> 
> Yes that correct. When I create the /var/lib/nfs/v4recovery directory my 
> problems are gone.
> 
> On the server and the client v3 and v4 are compiled into the kernel. Al 
> mounts are v3. Should I either remove v4 from the kernels or upgrade 
> nfs-utils-1.2.3 to something newer?
> 
> (a search on the linux-nfs wiki for nfsdcld does not return any hits)
> 


Just creating the legacy state tracking directory should be enough to
work around the problem for now. Eventually you'll want to move to a
newer nfs-utils (1.2.5 or later) and run nfsdcld to handle the v4
client tracking. Alternately, you can just build the kernel without v4
support, but that's really not necessary.

Now, that said...this upcall should be timing out in 30s, so it's not
clear to me why the daemon is hanging for 120s. I'll have to see if I
can reproduce this and track down the problem.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux