On Sat, Mar 21, 2020 at 09:23:54AM +0800, Qiujun Huang wrote: > On Sat, Mar 21, 2020 at 9:02 AM Marcelo Ricardo Leitner > <marcelo.leitner@xxxxxxxxx> wrote: > > > > On Sat, Mar 21, 2020 at 07:53:29AM +0800, Qiujun Huang wrote: > > ... > > > > > So, sctp_wfree was not called to destroy SKB) > > > > > > > > > > then migrate happened > > > > > > > > > > sctp_for_each_tx_datachunk( > > > > > sctp_clear_owner_w); > > > > > sctp_assoc_migrate(); > > > > > sctp_for_each_tx_datachunk( > > > > > sctp_set_owner_w); > > > > > SKB was not in the outq, and was not changed to newsk > > > > > > > > The real fix is to fix the migration to the new socket, though the > > > > situation on which it is happening is still not clear. > > > > > > > > The 2nd sendto() call on the reproducer is sending 212992 bytes on a > > > > single call. That's usually the whole sndbuf size, and will cause > > > > fragmentation to happen. That means the datamsg will contain several > > > > skbs. But still, the sacked chunks should be freed if needed while the > > > > remaining ones will be left on the queues that they are. > > > > > > in sctp_sendmsg_to_asoc > > > datamsg holds his chunk result in that the sacked chunks can't be freed > > > > Right! Now I see it, thanks. > > In the end, it's not a locking race condition. It's just not iterating > > on the lists properly. > > > > > > > > list_for_each_entry(chunk, &datamsg->chunks, frag_list) { > > > sctp_chunk_hold(chunk); > > > sctp_set_owner_w(chunk); > > > chunk->transport = transport; > > > } > > > > > > any ideas to handle it? > > > > sctp_for_each_tx_datachunk() needs to be aware of this situation. > > Instead of iterating directly/only over the chunk list, it should > > iterate over the datamsgs instead. Something like the below (just > > compile tested). > > > > Then, the old socket will be free to die regardless of the new one. > > Otherwise, if this association gets stuck on retransmissions or so, > > the old socket would not be freed till then. > > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c > > index fed26a1e9518..85c742310d26 100644 > > --- a/net/sctp/socket.c > > +++ b/net/sctp/socket.c > > @@ -151,9 +151,10 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc, > > void (*cb)(struct sctp_chunk *)) > > > > { > > + struct sctp_datamsg *msg, *prev_msg = NULL; > > struct sctp_outq *q = &asoc->outqueue; > > struct sctp_transport *t; > > - struct sctp_chunk *chunk; > > + struct sctp_chunk *chunk, *c; I missed to swap some lines here, for reverse christmass-tree style, btw. > > > > list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) > > list_for_each_entry(chunk, &t->transmitted, transmitted_list) > > @@ -162,8 +163,14 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc, > > list_for_each_entry(chunk, &q->retransmit, transmitted_list) > > cb(chunk); > > > > - list_for_each_entry(chunk, &q->sacked, transmitted_list) > > - cb(chunk); > > + list_for_each_entry(chunk, &q->sacked, transmitted_list) { > > + msg = chunk->msg; > > + if (msg == prev_msg) > > + continue; > > + list_for_each_entry(c, &msg->chunks, frag_list) > > + cb(c); > > + prev_msg = msg; > > + } > > great. I'll trigger a syzbot test. Thanks. Mind that it may need to handled on the other lists as well. I didn't check them :] > > > > > list_for_each_entry(chunk, &q->abandoned, transmitted_list) > > cb(chunk);