Re: RPC Race Condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 22, 2012 at 06:28:02PM +0000, Myklebust, Trond wrote:
> Hi Chris,
> 
> Thanks for helping to debug this, and for providing such a thorough
> analysis! This is great work!
> 
> Does adding a check in xs_error_report() for xprt_connecting(xprt) so
> that we skip the call to xprt_wake_pending_tasks() if true, help to
> resolve the race?

This is what I changed:

--- a/net/sunrpc/xprtsock.c     2012-08-14 08:47:16.000000000 -0400
+++ b/net/sunrpc/xprtsock.c     2012-10-22 14:50:09.237725498 -0400
@@ -1571,7 +1571,8 @@
        dprintk("RPC:       %s client %p...\n"
                        "RPC:       error %d\n",
                        __func__, xprt, sk->sk_err);
-       xprt_wake_pending_tasks(xprt, -EAGAIN);
+       if (!xprt_connecting(xprt))
+               xprt_wake_pending_tasks(xprt, -EAGAIN);
 out:
        read_unlock_bh(&sk->sk_callback_lock);
 }

Unfortunately it does not resolve it.  It appears that does stop the wakeup
from happening where I detailed it before, but we still wake up too early.  The
following is the call stack from SystemTap catching the call to `rpc_make_runnable':

 0xffffffffa0434840 : rpc_make_runnable+0x0/0x80 [sunrpc]
 0xffffffffa0434b5e : rpc_wake_up_task_queue_locked+0x18e/0x270 [sunrpc]
 0xffffffffa0434c8e : rpc_wake_up_status+0x4e/0x80 [sunrpc]
 0xffffffffa042d8ec : xprt_wake_pending_tasks+0x2c/0x30 [sunrpc]
 0xffffffffa042da56 : xprt_disconnect_done+0x46/0x70 [sunrpc]
 0xffffffffa04305b0 : xs_tcp_setup_socket+0x330/0x4c0 [sunrpc]
 0xffffffff8108c760 : worker_thread+0x170/0x2a0 [kernel]
 0xffffffff81091d66 : kthread+0x96/0xa0 [kernel]
 0xffffffff8100c14a : child_rip+0xa/0x20 [kernel]

However, its mildly confusing to me since I cannot follow that exact
call stack in the code.  The nearest I can find to that is the
following:

  rpc_make_runnable
  rpc_wake_up_task_queue_locked
  rpc_wake_up_status
  xprt_wake_pending_tasks
  xprt_disconnect_done
  xs_sock_mark_closed
  xs_abort_connection
  xs_tcp_reuse_connection
  xs_tcp_setup_socket

As far as I can tell, we get here because the call to `kernel_connect'
from within `xs_abort_connection' by rpciod is returning 0 sometimes
(and EINPROGRESS others).  When it returns 0, we enter this block:

  if (!result)
          xs_sock_mark_closed(&transport->xprt);
  else
    dprintk("RPC:       AF_UNSPEC connect return code %d\n", result);

and that eventually gets us woken up via the call stack above.  The end
result is the same, we wake up and move on to `call_transmit' before
rpciod is done connecting the socket.

I'll do some more tracing and see what else I can come up with.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux