On Wed, Oct 10, 2018 at 08:28:22PM +0200, Dmitry Vyukov wrote: > On Wed, Oct 10, 2018 at 8:13 PM, Marcelo Ricardo Leitner > <marcelo.leitner@xxxxxxxxx> wrote: > > On Wed, Oct 10, 2018 at 05:28:12PM +0200, Dmitry Vyukov wrote: > >> On Fri, Oct 5, 2018 at 4:58 PM, Marcelo Ricardo Leitner > >> <marcelo.leitner@xxxxxxxxx> wrote: > >> > On Thu, Oct 04, 2018 at 01:48:03AM -0700, syzbot wrote: > >> >> Hello, > >> >> > >> >> syzbot found the following crash on: > >> >> > >> >> HEAD commit: 4e6d47206c32 tls: Add support for inplace records encryption > >> >> git tree: net-next > >> >> console output: https://syzkaller.appspot.com/x/log.txt?x=13834b81400000 > >> >> kernel config: https://syzkaller.appspot.com/x/.config?x=e569aa5632ebd436 > >> >> dashboard link: https://syzkaller.appspot.com/bug?extid=c7dd55d7aec49d48e49a > >> >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > >> >> > >> >> Unfortunately, I don't have any reproducer for this crash yet. > >> >> > >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit: > >> >> Reported-by: syzbot+c7dd55d7aec49d48e49a@xxxxxxxxxxxxxxxxxxxxxxxxx > >> >> > >> >> netlink: 'syz-executor1': attribute type 1 has an invalid length. > >> >> ================================================================== > >> >> BUG: KASAN: use-after-free in sctp_id2assoc+0x3a7/0x3e0 > >> >> net/sctp/socket.c:276 > >> >> Read of size 8 at addr ffff880195b3eb20 by task syz-executor2/15454 > >> >> > >> >> CPU: 1 PID: 15454 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #242 > >> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > >> >> Google 01/01/2011 > >> >> Call Trace: > >> >> __dump_stack lib/dump_stack.c:77 [inline] > >> >> dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113 > >> >> print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256 > >> >> kasan_report_error mm/kasan/report.c:354 [inline] > >> >> kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 > >> >> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 > >> >> sctp_id2assoc+0x3a7/0x3e0 net/sctp/socket.c:276 > >> > > >> > I'm not seeing yet how this could happen. > >> > All sockopts here are serialized by sock_lock. > >> > do_peeloff here would create another socket, but the issue was > >> > triggered before that. > >> > The same function that freed this memory, also removes the entry from > >> > idr mapping, so this entry shouldn't be there anymore. > >> > > >> > I have only two theories so far: > >> > - an issue with IDR/RCU. > >> > - something else happened that just the call stacks are not revealing. > >> > >> The "asoc->base.sk != sk" check after idr_find suggests that we don't > >> actually know what sock it belongs to. And if we don't know then > > > > Right. The check is more because the IDR is global and not per socket > > (and we don't want sockets accessing asocs from other sockets), and not > > that the asoc may move to another socket in between, but it also > > protects from such cases, yes. > > > >> locking this sock can't help keeping another sock association alive. > >> Am I missing something obvious here? Should we take assoc ref while we > > > > Not sure. Maybe I am. Thanks for looking into this, btw. > > > >> are still holding sctp_assocs_id_lock? > > > > Shouldn't be needed. > > > > Solely by the call stacks: > > - we tried to establish a new asoc from a sctp_connect() call, > > blocking one. > > - it slept waiting for the connect > > - (something closed the asoc in between the sleeps, because it freed > > the asoc right when waking up on sctp_wait_for_connect()) > > - it freed the asoc after sleeping on it on sctp_wait_for_connect [A] > > - another thread tried to peeloff that asoc [B] > > > > For [B] to access the asoc in question, it had to take the same sock > > lock [A] had taken, and then the idr should not return an asoc in > > sctp_i2asoc(). Note that we can't peeloff an asoc twice, thus why > > the certainty here. > > > > If [B] actually kicked in before the sleep resumed, that should have > > been fine because it took the same sock lock [A] would have to > > re-take. In this case an asoc would have been returned by > > sctp_id2asoc(), the asoc would have been moved to a new socket, but > > all while holding the original socket sock lock. > > But why A and B use the same lock? > > sctp_assocs_id is global, so it contains asocs from all sockets, right? > assoc id comes straight from userspaces. > So isn't it possible that B uses completely different sock but passes > assoc id from the A sock? Then B should find assoc in sctp_assocs_id, > and at the point of "asoc->base.sk != sk" check the assoc can be > already freed. That explains it. Somehow I was thinking the issue was for reading ->dead instead. Now it's pretty clear. Then this should be the patch we want. Can you please give it a spin? (only compile tested) While holding the spinlock, an entry cannot be removed from the idr and thus it cannot be freed. So even if the app uses an id from another socket, it will still be there. ---8<--- diff --git a/net/sctp/socket.c b/net/sctp/socket.c index f73e9d38d5ba..a7722f43aa69 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -271,11 +271,10 @@ struct sctp_association *sctp_id2assoc(struct sock *sk, sctp_assoc_t id) spin_lock_bh(&sctp_assocs_id_lock); asoc = (struct sctp_association *)idr_find(&sctp_assocs_id, (int)id); + if (asoc && (asoc->base.sk != sk || asoc->base.dead)) + asoc = NULL; spin_unlock_bh(&sctp_assocs_id_lock); - if (!asoc || (asoc->base.sk != sk) || asoc->base.dead) - return NULL; - return asoc; }