On Tue, 4 Feb 2025 at 10:59, Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote: > > On Tue, Feb 04, 2025 at 01:38:50AM +0100, Michal Luczaj wrote: > >On 2/3/25 10:57, syzbot wrote: > >> Hello, > >> > >> syzbot found the following issue on: > >> > >> HEAD commit: c2933b2befe2 Merge tag 'net-6.14-rc1' of git://git.kernel... > >> git tree: net-next > >> console output: https://syzkaller.appspot.com/x/log.txt?x=16f676b0580000 > >> kernel config: https://syzkaller.appspot.com/x/.config?x=d033b14aeef39158 > >> dashboard link: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c > >> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13300b24580000 > >> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12418518580000 > >> > >> Downloadable assets: > >> disk image: https://storage.googleapis.com/syzbot-assets/c7667ae12603/disk-c2933b2b.raw.xz > >> vmlinux: https://storage.googleapis.com/syzbot-assets/944ca63002c1/vmlinux-c2933b2b.xz > >> kernel image: https://storage.googleapis.com/syzbot-assets/30748115bf0b/bzImage-c2933b2b.xz > >> > >> The issue was bisected to: > >> > >> commit fcdd2242c0231032fc84e1404315c245ae56322a > >> Author: Michal Luczaj <mhal@xxxxxxx> > >> Date: Tue Jan 28 13:15:27 2025 +0000 > >> > >> vsock: Keep the binding until socket destruction > > > >syzbot is correct (thanks), bisected commit introduced a regression. > > > >sock_orphan(sk) is being called without taking into consideration that it > >does `sk->sk_wq = NULL`. Later, if SO_LINGER is set, sk->sk_wq gets > >dereferenced in virtio_transport_wait_close(). > > > >Repro, as shown by syzbot, is simply > >from socket import * > >lis = socket(AF_VSOCK, SOCK_STREAM) > >lis.bind((1, 1234)) # VMADDR_CID_LOCAL > >lis.listen() > >s = socket(AF_VSOCK, SOCK_STREAM) > >s.setsockopt(SOL_SOCKET, SO_LINGER, (1<<32) | 1) > >s.connect(lis.getsockname()) > >s.close() > > > >A way of fixing this is to put sock_orphan(sk) back where it was before the > >breaking patch and instead explicitly flip just the SOCK_DEAD bit, i.e. > > > >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c > >index 075695173648..06250bb9afe2 100644 > >--- a/net/vmw_vsock/af_vsock.c > >+++ b/net/vmw_vsock/af_vsock.c > >@@ -824,13 +824,14 @@ static void __vsock_release(struct sock *sk, int level) > > */ > > lock_sock_nested(sk, level); > > > >- sock_orphan(sk); > >+ sock_set_flag(sk, SOCK_DEAD); > > > > if (vsk->transport) > > vsk->transport->release(vsk); > > else if (sock_type_connectible(sk->sk_type)) > > vsock_remove_sock(vsk); > > > >+ sock_orphan(sk); > > sk->sk_shutdown = SHUTDOWN_MASK; > > > > skb_queue_purge(&sk->sk_receive_queue); > > > >I'm not sure this is the most elegant code (sock_orphan(sk) sets SOCK_DEAD > >on a socket that is already SOCK_DEAD), but here it goes: > >https://lore.kernel.org/netdev/20250204-vsock-linger-nullderef-v1-0-6eb1760fa93e@xxxxxxx/ > > What about the fix proposed here: > https://lore.kernel.org/lkml/20250203124959.114591-1-aha310510@xxxxxxxxx/ mmm, nope, that one will completely bypass the lingering, right? Stefano > > > > >One more note: man socket(7) says lingering also happens on shutdown(). > >Should vsock follow that? > > Good point, I think so. > IMHO we should handle both of them in af_vsock.c if it's possible, but > maybe we need a bit of refactoring. > > Anyway, net-next material, right? > > Stefano