On Fri, 2020-06-05 at 10:10 +0800, Zhengbin (OSKernel) wrote: > The complete process is like this: > > xprt_destroy > wait_on_bit_lock(&xprt->state, XPRT_LOCKED, > TASK_UNINTERRUPTIBLE) > -->getlock > del_timer_sync(&xprt->timer) -->del xprt->timer > INIT_WORK(&xprt->task_cleanup, xprt_destroy_cb) > > xprt_destroy_cb > xs_destroy(xprt->ops->destroy) > cancel_delayed_work_sync -->will call > transport->connect_worker, whose callback is xs_udp_setup_socket > xs_xprt_free(xprt) -->free xprt > > xs_udp_setup_socket > sock = xs_create_sock > xprt_unlock_connect > if (!test_bit(XPRT_LOCKED, &xprt->state)) -->state is > XPRT_LOCKED > xprt_schedule_autodisconnect > mod_timer > internal_add_timer -->insert xprt->timer to base timer > list The call to xprt_lock_connect() in xs_connect() is supposed to ensure exclusion w.r.t. wait_on_bit_lock(). In other words if the transport- >connect_worker is actually queued, it is also supposed to be holding the XPRT_LOCK and ensuring that we can't get into the above situation. Why is that not the case here? > > On 2020/6/4 23:39, Trond Myklebust wrote: > > On Thu, 2020-06-04 at 22:49 +0800, Zheng Bin wrote: > > > If RPC use udp as it's transport protocol, transport- > > > >connect_worker > > > will call xs_udp_setup_socket. > > > xs_udp_setup_socket > > > sock = xs_create_sock > > > if (IS_ERR(sock)) > > > goto out; > > > out: > > > xprt_unlock_connect > > > xprt_schedule_autodisconnect > > > mod_timer > > > internal_add_timer -->insert xprt->timer to base > > > timer > > > list > > > > > > xs_destroy > > > cancel_delayed_work_sync(&transport->connect_worker) > > > xs_xprt_free(xprt) -->free xprt > > > > > > Thus use-after-free will happen. > > > > > > Signed-off-by: Zheng Bin <zhengbin13@xxxxxxxxxx> > > > --- > > > net/sunrpc/xprtsock.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > > index 845d0be805ec..c796808e7f7a 100644 > > > --- a/net/sunrpc/xprtsock.c > > > +++ b/net/sunrpc/xprtsock.c > > > @@ -1242,6 +1242,7 @@ static void xs_destroy(struct rpc_xprt > > > *xprt) > > > dprintk("RPC: xs_destroy xprt %p\n", xprt); > > > > > > cancel_delayed_work_sync(&transport->connect_worker); > > > + del_timer_sync(&xprt->timer); > > > xs_close(xprt); > > > cancel_work_sync(&transport->recv_worker); > > > cancel_work_sync(&transport->error_worker); > > > -- > > > 2.26.0.106.g9fadedd > > > > > I'm confused. How can this happen given that xprt_destroy() first > > takes > > the XPRT_LOCK, and then deletes xprt->timer? > > > > Right now, the socket code knows nothing about the details of xprt- > > > timer and what it is used for. I'd prefer to keep it that way if > > possible. > > -- Trond Myklebust CTO, Hammerspace Inc 4984 El Camino Real, Suite 208 Los Altos, CA 94022 www.hammer.space