Re: [PATCH] sunrpc: need delete xprt->timer in xs_destroy

"Zhengbin (OSKernel)" <zhengbin13@xxxxxxxxxx> · Fri, 5 Jun 2020 10:10:29 +0800

The complete process is like this:

xprt_destroy
  wait_on_bit_lock(&xprt->state, XPRT_LOCKED, TASK_UNINTERRUPTIBLE)  
-->getlock
  del_timer_sync(&xprt->timer)   -->del xprt->timer
  INIT_WORK(&xprt->task_cleanup, xprt_destroy_cb)

xprt_destroy_cb
  xs_destroy(xprt->ops->destroy)
    cancel_delayed_work_sync     -->will call 
transport->connect_worker, whose callback is xs_udp_setup_socket
    xs_xprt_free(xprt)                    -->free xprt

xs_udp_setup_socket
  sock = xs_create_sock
  xprt_unlock_connect
      if (!test_bit(XPRT_LOCKED, &xprt->state)) -->state is XPRT_LOCKED
      xprt_schedule_autodisconnect
        mod_timer
          internal_add_timer  -->insert xprt->timer to base timer list

On 2020/6/4 23:39, Trond Myklebust wrote:
On Thu, 2020-06-04 at 22:49 +0800, Zheng Bin wrote:
If RPC use udp as it's transport protocol, transport->connect_worker
will call xs_udp_setup_socket.
xs_udp_setup_socket
   sock = xs_create_sock
   if (IS_ERR(sock))
     goto out;
   out:
     xprt_unlock_connect
       xprt_schedule_autodisconnect
         mod_timer
           internal_add_timer  -->insert xprt->timer to base timer
list

xs_destroy
   cancel_delayed_work_sync(&transport->connect_worker)
   xs_xprt_free(xprt)           -->free xprt

Thus use-after-free will happen.

Signed-off-by: Zheng Bin <zhengbin13@xxxxxxxxxx>
---
  net/sunrpc/xprtsock.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 845d0be805ec..c796808e7f7a 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1242,6 +1242,7 @@ static void xs_destroy(struct rpc_xprt *xprt)
  	dprintk("RPC:       xs_destroy xprt %p\n", xprt);

  	cancel_delayed_work_sync(&transport->connect_worker);
+	del_timer_sync(&xprt->timer);
  	xs_close(xprt);
  	cancel_work_sync(&transport->recv_worker);
  	cancel_work_sync(&transport->error_worker);
--
2.26.0.106.g9fadedd

I'm confused. How can this happen given that xprt_destroy() first takes
the XPRT_LOCK, and then deletes xprt->timer?

Right now, the socket code knows nothing about the details of xprt-
timer and what it is used for. I'd prefer to keep it that way if
possible.