Re: KASAN: use-after-free Read in __lock_sock

Lee Jones <lee.jones@xxxxxxxxxx> · Mon, 6 Dec 2021 19:21:03 +0000

Networking Gurus,

On Thu, 22 Nov 2018, Marcelo Ricardo Leitner wrote:
> On Thu, Nov 22, 2018 at 10:44:16PM +0900, Xin Long wrote:
> > On Thu, Nov 22, 2018 at 10:13 PM Marcelo Ricardo Leitner
> > <marcelo.leitner@xxxxxxxxx> wrote:
> > >
> > > On Mon, Nov 19, 2018 at 05:57:33PM +0900, Xin Long wrote:
> > > > On Sat, Nov 17, 2018 at 4:18 PM syzbot
> > > > <syzbot+9276d76e83e3bcde6c99@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following crash on:
> > > > >
> > > > > HEAD commit:    ccda4af0f4b9 Linux 4.20-rc2
> > > > > git tree:       upstream
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x6cd533400000
> > > > > kernel config:  https://syzkaller.appspot.com/x/.config?xJ0a89f12ca9b0f5
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid’76d76e83e3bcde6c99
> > > > > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this crash yet.
> > > > >
> > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > > Reported-by: syzbot+9276d76e83e3bcde6c99@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > >
> > > > > netlink: 5 bytes leftover after parsing attributes in process
> > > > > `syz-executor5'.
> > > > > =================================
> > > > > BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
> > > > > kernel/locking/lockdep.c:3218
> > > > > Read of size 8 at addr ffff8881d26d60e0 by task syz-executor1/13725
> > > > >
> > > > > CPU: 0 PID: 13725 Comm: syz-executor1 Not tainted 4.20.0-rc2+ #333
> > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > > > Google 01/01/2011
> > > > > Call Trace:
> > > > >   __dump_stack lib/dump_stack.c:77 [inline]
> > > > >   dump_stack+0x244/0x39d lib/dump_stack.c:113
> > > > >   print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
> > > > >   kasan_report_error mm/kasan/report.c:354 [inline]
> > > > >   kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
> > > > >   __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
> > > > >   __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
> > > > >   lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
> > > > >   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
> > > > >   _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
> > > > >   spin_lock_bh include/linux/spinlock.h:334 [inline]
> > > > >   __lock_sock+0x203/0x350 net/core/sock.c:2253
> > > > >   lock_sock_nested+0xfe/0x120 net/core/sock.c:2774
> > > > >   lock_sock include/net/sock.h:1492 [inline]
> > > > >   sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324
> > > >
> > > > static int sctp_sock_dump(struct sctp_transport *tsp, void *p)
> > > > {
> > > >         struct sctp_endpoint *ep = tsp->asoc->ep;
> > > >         struct sctp_comm_param *commp = p;
> > > >         struct sock *sk = ep->base.sk; <--- [1]
> > > > ...
> > > >         int err = 0;
> > > >
> > > >         lock_sock(sk);  <--- [2]
> > > >
> > > > Between [1] and [2], an asoc peeloff may happen, still thinking
> > > > how to avoid this.
> > >
> > > This race cannot happen more than once for an asoc, so something
> > > like this may be doable:
> > >
> > >         struct sctp_comm_param *commp = p;
> > >         struct sctp_endpoint *ep;
> > >         struct sock *sk;
> > > ...
> > >         int err = 0;
> > >
> > > again:
> > >         ep = tsp->asoc->ep;
> > >         sk = ep->base.sk; <---[3]
> > >         lock_sock(sk);  <--- [2]
> > if peel-off happens between [3] and [2], and sk is freed
> > somewhere, it will panic on [2] when trying to get the
> > sk->lock, no?
> 
> Not sure what protects it, but this construct is also used in BH processing at
> sctp_rcv():
> ...
>         bh_lock_sock(sk); [4]
> 
>         if (sk != rcvr->sk) {
>                 /* Our cached sk is different from the rcvr->sk.  This is
>                  * because migrate()/accept() may have moved the association
>                  * to a new socket and released all the sockets.  So now we
>                  * are holding a lock on the old socket while the user may
>                  * be doing something with the new socket.  Switch our veiw
>                  * of the current sk.
>                  */
>                 bh_unlock_sock(sk);
>                 sk = rcvr->sk;
>                 bh_lock_sock(sk);
>         }
> ...
> 
> If it is not safe, then we have an issue there too.
> And by [4] that copy on sk is pretty old already.
> 
> > 
> > >         if (sk != tsp->asoc->ep->base.sk) {
> > >                 /* Asoc was peeloff'd */
> > >                 unlock_sock(sk);
> > >                 goto again;
> > >         }
> > >
> > > Similarly to what we did on cea0cc80a677 ("sctp: use the right sk
> > > after waking up from wait_buf sleep").

I'm currently debugging something similar (the same perhaps) on an
older Stable kernel.  However the same Repro which was found and
authored for a production level mobile phone, doesn't seem to trigger
on an x86_64 qemu instance running Mainline.

That's not to say that Mainline isn't still suffering with this issue
though.  It probably has more to do with the specificity of the Repro
which was designed specifically to trigger on the target device.

Does anyone know if this particular issue was ever patched?

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog