On Thu, Nov 22, 2018 at 10:44:16PM +0900, Xin Long wrote: > On Thu, Nov 22, 2018 at 10:13 PM Marcelo Ricardo Leitner > <marcelo.leitner@xxxxxxxxx> wrote: > > > > On Mon, Nov 19, 2018 at 05:57:33PM +0900, Xin Long wrote: > > > On Sat, Nov 17, 2018 at 4:18 PM syzbot > > > <syzbot+9276d76e83e3bcde6c99@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > Hello, > > > > > > > > syzbot found the following crash on: > > > > > > > > HEAD commit: ccda4af0f4b9 Linux 4.20-rc2 > > > > git tree: upstream > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=156cd533400000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=4a0a89f12ca9b0f5 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=9276d76e83e3bcde6c99 > > > > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > > > > > > > Unfortunately, I don't have any reproducer for this crash yet. > > > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > > Reported-by: syzbot+9276d76e83e3bcde6c99@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > > > netlink: 5 bytes leftover after parsing attributes in process > > > > `syz-executor5'. > > > > ================================================================== > > > > BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20 > > > > kernel/locking/lockdep.c:3218 > > > > Read of size 8 at addr ffff8881d26d60e0 by task syz-executor1/13725 > > > > > > > > CPU: 0 PID: 13725 Comm: syz-executor1 Not tainted 4.20.0-rc2+ #333 > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > > > Google 01/01/2011 > > > > Call Trace: > > > > __dump_stack lib/dump_stack.c:77 [inline] > > > > dump_stack+0x244/0x39d lib/dump_stack.c:113 > > > > print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 > > > > kasan_report_error mm/kasan/report.c:354 [inline] > > > > kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 > > > > __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 > > > > __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218 > > > > lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844 > > > > __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] > > > > _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 > > > > spin_lock_bh include/linux/spinlock.h:334 [inline] > > > > __lock_sock+0x203/0x350 net/core/sock.c:2253 > > > > lock_sock_nested+0xfe/0x120 net/core/sock.c:2774 > > > > lock_sock include/net/sock.h:1492 [inline] > > > > sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324 > > > > > > static int sctp_sock_dump(struct sctp_transport *tsp, void *p) > > > { > > > struct sctp_endpoint *ep = tsp->asoc->ep; > > > struct sctp_comm_param *commp = p; > > > struct sock *sk = ep->base.sk; <--- [1] > > > ... > > > int err = 0; > > > > > > lock_sock(sk); <--- [2] > > > > > > Between [1] and [2], an asoc peeloff may happen, still thinking > > > how to avoid this. > > > > This race cannot happen more than once for an asoc, so something > > like this may be doable: > > > > struct sctp_comm_param *commp = p; > > struct sctp_endpoint *ep; > > struct sock *sk; > > ... > > int err = 0; > > > > again: > > ep = tsp->asoc->ep; > > sk = ep->base.sk; <---[3] > > lock_sock(sk); <--- [2] > if peel-off happens between [3] and [2], and sk is freed > somewhere, it will panic on [2] when trying to get the > sk->lock, no? Not sure what protects it, but this construct is also used in BH processing at sctp_rcv(): ... bh_lock_sock(sk); [4] if (sk != rcvr->sk) { /* Our cached sk is different from the rcvr->sk. This is * because migrate()/accept() may have moved the association * to a new socket and released all the sockets. So now we * are holding a lock on the old socket while the user may * be doing something with the new socket. Switch our veiw * of the current sk. */ bh_unlock_sock(sk); sk = rcvr->sk; bh_lock_sock(sk); } ... If it is not safe, then we have an issue there too. And by [4] that copy on sk is pretty old already. > > > if (sk != tsp->asoc->ep->base.sk) { > > /* Asoc was peeloff'd */ > > unlock_sock(sk); > > goto again; > > } > > > > Similarly to what we did on cea0cc80a677 ("sctp: use the right sk > > after waking up from wait_buf sleep").