On Thu, 23 Dec 2021, Xin Long wrote: > This patch is to delay the endpoint free by calling call_rcu() to fix > another use-after-free issue in sctp_sock_dump(): > > BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20 > Call Trace: > __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218 > lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844 > __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] > _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 > spin_lock_bh include/linux/spinlock.h:334 [inline] > __lock_sock+0x203/0x350 net/core/sock.c:2253 > lock_sock_nested+0xfe/0x120 net/core/sock.c:2774 > lock_sock include/net/sock.h:1492 [inline] > sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324 > sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091 > sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527 > __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049 > inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065 > netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244 > __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352 > netlink_dump_start include/linux/netlink.h:216 [inline] > inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170 > __sock_diag_cmd net/core/sock_diag.c:232 [inline] > sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263 > netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477 > sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274 > > This issue occurs when asoc is peeled off and the old sk is freed after > getting it by asoc->base.sk and before calling lock_sock(sk). > > To prevent the sk free, as a holder of the sk, ep should be alive when > calling lock_sock(). This patch uses call_rcu() and moves sock_put and > ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to > hold the ep under rcu_read_lock in sctp_transport_traverse_process(). > > If sctp_endpoint_hold() returns true, it means this ep is still alive > and we have held it and can continue to dump it; If it returns false, > it means this ep is dead and can be freed after rcu_read_unlock, and > we should skip it. > > In sctp_sock_dump(), after locking the sk, if this ep is different from > tsp->asoc->ep, it means during this dumping, this asoc was peeled off > before calling lock_sock(), and the sk should be skipped; If this ep is > the same with tsp->asoc->ep, it means no peeloff happens on this asoc, > and due to lock_sock, no peeloff will happen either until release_sock. > > Note that delaying endpoint free won't delay the port release, as the > port release happens in sctp_endpoint_destroy() before calling call_rcu(). > Also, freeing endpoint by call_rcu() makes it safe to access the sk by > asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv(). > > Thanks Jones to bring this issue up. > > v1->v2: > - improve the changelog. > - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed. > > Reported-by: syzbot+9276d76e83e3bcde6c99@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: Lee Jones <lee.jones@xxxxxxxxxx> > Fixes: d25adbeb0cdb ("sctp: fix an use-after-free issue in sctp_sock_dump") > Signed-off-by: Xin Long <lucien.xin@xxxxxxxxx> > --- > include/net/sctp/sctp.h | 6 +++--- > include/net/sctp/structs.h | 3 ++- > net/sctp/diag.c | 12 ++++++------ > net/sctp/endpointola.c | 23 +++++++++++++++-------- > net/sctp/socket.c | 23 +++++++++++++++-------- > 5 files changed, 41 insertions(+), 26 deletions(-) My test has been soaking for about an hour now with no crashes. So far so good: Tested-by: Lee Jones <lee.jones@xxxxxxxxxx> -- Lee Jones [李琼斯] Senior Technical Lead - Developer Services Linaro.org │ Open source software for Arm SoCs Follow Linaro: Facebook | Twitter | Blog