Re: [PATCH v2] net: rds: acquire refcount on TCP sockets

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Thu, 5 May 2022 00:15:46 +0900

On 2022/05/04 13:58, Tetsuo Handa wrote:
> On 2022/05/04 12:09, Eric Dumazet wrote:
>> This exit() handler _has_ to remove all known listeners, and
>> definitely cancel work queues (synchronous operation)
>> before the actual "struct net" free can happen later.
> 
> But in your report, rds_tcp_tune() is called from rds_tcp_conn_path_connect() from
> rds_connect_worker() via "struct rds_connection"->cp_conn_w work. I can see that
> rds_tcp_kill_sock() calls rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w), and
> rds_tcp_listen_stop() calls flush_workqueue(rds_wq) and flush_work(&rtn->rds_tcp_accept_w).
> 
> But I can't see how rds_tcp_exit_net() synchronously cancels all works associated
> with "struct rds_conn_path".
> 
> struct rds_conn_path {
>         struct delayed_work     cp_send_w;
>         struct delayed_work     cp_recv_w;
>         struct delayed_work     cp_conn_w;
>         struct work_struct      cp_down_w;
> }
> 
> These works are queued to rds_wq, but flush_workqueue() waits for completion only
> if already queued. What if timer for queue_delayed_work() has not expired, or was
> about to call queue_delayed_work() ? Is flush_workqueue(rds_wq) sufficient?

 rds_tcp_tune+0x5a0/0x5f0 net/rds/tcp.c:503
 rds_tcp_conn_path_connect+0x489/0x880 net/rds/tcp_connect.c:127
 rds_connect_worker+0x1a5/0x2c0 net/rds/threads.c:176
 process_one_work+0x996/0x1610 kernel/workqueue.c:2289

rds_tcp_conn_path_connect is referenced by
"struct rds_transport rds_tcp_transport"->conn_path_connect.
It is invoked by

  ret = conn->c_trans->conn_path_connect(cp)

in rds_connect_worker().

rds_connect_worker is referenced by "struct rds_conn_path"->cp_conn_w
via INIT_DELAYED_WORK().

queue_delayed_work(rds_wq, &cp->cp_conn_w, *) is called by
rds_queue_reconnect() or rds_conn_path_connect_if_down().

If rds_conn_path_connect_if_down() were called from
rds_tcp_accept_one_path() from rds_tcp_accept_one(),
rds_tcp_tune() from rds_tcp_accept_one() was already called
before rds_tcp_tune() from rds_tcp_conn_path_connect() is called.
Since the addition on 0 was not reported at rds_tcp_tune() from
rds_tcp_accept_one(), what Eric is reporting cannot be from
rds_tcp_accept_one() from rds_tcp_accept_worker().

Despite rds_tcp_kill_sock() sets rtn->rds_tcp_listen_sock = NULL and
waits for rds_tcp_accept_one() from rds_tcp_accept_worker() to complete
using flush_workqueue(rds_wq), what Eric is reporting is different from
what syzbot+694120e1002c117747ed was reporting.

> 
> Anyway, if rds_tcp_kill_sock() can somehow guarantee that all works are completed
> or cancelled, the fix would look like something below?

I think it is OK to apply below diff in order to avoid addition on 0 problem, but
it is not proven that kmem_cache_free() is not yet called. What should we do?

> 
>  net/rds/tcp.c         | 11 ++++++++---
>  net/rds/tcp.h         |  2 +-
>  net/rds/tcp_connect.c |  5 ++++-
>  net/rds/tcp_listen.c  |  5 ++++-
>  4 files changed, 17 insertions(+), 6 deletions(-)
>