On Fri, 2009-02-27 at 21:07 -0800, Aaron Straus wrote: > On Feb 27 08:40 PM, Trond Myklebust wrote: > > > > > static inline int xprt_test_and_set_connected(struct rpc_xprt *xprt) > > > > > Index: linux-2.6.27.15-2/net/sunrpc/xprtsock.c > > > > > =================================================================== > > > > > --- linux-2.6.27.15-2.orig/net/sunrpc/xprtsock.c > > > > > +++ linux-2.6.27.15-2/net/sunrpc/xprtsock.c > > > > > @@ -1512,14 +1512,13 @@ static void xs_udp_finish_connecting(str > > > > > sk->sk_no_check = UDP_CSUM_NORCV; > > > > > sk->sk_allocation = GFP_ATOMIC; > > > > > > > > > > - xprt_set_connected(xprt); > > > > > - > > > > > /* Reset to new socket */ > > > > > transport->sock = sock; > > > > > transport->inet = sk; > > > > > > > > > > xs_set_memalloc(xprt); > > > > > > > > > > + xprt_set_connected(xprt); > > > > > write_unlock_bh(&sk->sk_callback_lock); > > > > > } > > > > > xs_udp_do_set_buffer_size(xprt); > > > > > > > > > > @@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task) > > > } > > > > > > switch (status) { > > > + case -EAGAIN: > > > + xs_nospace(task); > > > + break; > > > case -ENETUNREACH: > > > case -EPIPE: > > > case -ECONNREFUSED: > > > /* When the server has died, an ICMP port unreachable message > > > * prompts ECONNREFUSED. */ > > > - break; > > > - case -EAGAIN: > > > - xs_nospace(task); > > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags); > > > break; > > > default: > > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags); > > > dprintk("RPC: sendmsg returned unrecognized error %d\n", > > > -status); > > > - break; > > > } > > > > > > return status; > > > > > > This went in to 2.6.26 which was the first time we saw the bug > > > (2.6.26.3) on kerneloops. > > > > > > I don't know if *this* is a bad commit or some other locking changed in > > > 2.6.26 which tickles the bug. > > > > It should be unrelated. If the client managed to get past the call to > > xs_sendpages(), then the UDP socket must obviously exist already. > > PS just to repeat the previous discussion. > > It seems like xs_sendpages does check if sock is NULL and returns > -ENOTCONN in that case. > > The problem is that now in xs_udp_send_request falls into the default: > section of that switch statement and tries to do the: > > > > + clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags); > > but sock is NULL, so I think that's the oops. OK, that makes a lot of sense. We should definitely fix up both xs_tcp_send_request() and xs_udp_send_request() to deal correctly with that error. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html