Re: Fw: Deadlock regression in v2.6.31.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2009-11-25 at 09:56 +0100, Stephen R. van den Berg wrote: 
> > The problem vanishes as soon as I run v2.6.31.5 (neither kernel contains
> > any significant modules).
> 
> I did a bisect, and it turns out that the problem is there in 2.6.31.5 as well.

This makes sense. There have been no RPC level changes between 2.6.31.5
and 2.6.31.6.

> The traces are still valid.  This is on an NFS mounted root partition
> (NFSv3 over TCP), no other filesystems mounted (except a tmpfs here or
> there).  I turned on some debugging in net/sunrpc/sched.c, and the
> following happens when I execute "apt-get --reinstall install man-db"
> (it happens everytime, so it is very reproducible):
> 
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7827)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7828)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7830)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7831)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7833)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7835)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7836)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7838)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7839)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7841)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7842)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> RPC:  9697 __rpc_execute flags=0x1 cf849c44
> RPC:  9697 sleep_on(queue "xprt_pending" time 7844)
> RPC:  9697 added to queue cfa72d88 "xprt_pending"
> RPC:  9697 setting alarm for 60000 ms
> RPC:  9697 __rpc_wake_up_task (now 7845)
> RPC:  9697 disabling timer
> RPC:  9697 removed from queue cfa72d88 "xprt_pending"
> RPC:       __rpc_wake_up_task done
> 
> Ad infinitum.
> The cf849c44 is the task parameter which I printed as well.
> It looks like an endless loop in the statemachine.
> The kernel hangs at this point, the only way to get out of there is
> using SysBreak.
> I tried debugging it further, but I got lost in the statemachine (I think).

This just means that the RPC client is waiting for a reply from the NFS
server.

Does 'netstat -t' show that there is an active TCP connection to the
server's nfs port?
Does wireshark show that the client should have received a reply?

Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux