On Fri, May 29, 2015 at 01:50:37PM -0300, Marcelo Ricardo Leitner wrote: > On Fri, May 29, 2015 at 09:17:26AM -0400, Neil Horman wrote: > > On Thu, May 28, 2015 at 11:46:29AM -0300, Marcelo Ricardo Leitner wrote: > > > On Thu, May 28, 2015 at 10:27:32AM -0300, Marcelo Ricardo Leitner wrote: > > > > On Thu, May 28, 2015 at 08:17:27AM -0300, Marcelo Ricardo Leitner wrote: > > > > > On Thu, May 28, 2015 at 06:15:11AM -0400, Neil Horman wrote: > > > > > > On Wed, May 27, 2015 at 09:52:17PM -0300, mleitner@xxxxxxxxxx wrote: > > > > > > > From: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx> > > > > > > > > > > > > > > ->auto_asconf_splist is per namespace and mangled by functions like > > > > > > > sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization. > > > > > > > > > > > > > > Also, the call to inet_sk_copy_descendant() was backuping > > > > > > > ->auto_asconf_list through the copy but was not honoring > > > > > > > ->do_auto_asconf, which could lead to list corruption if it was > > > > > > > different between both sockets. > > > > > > > > > > > > > > This commit thus fixes the list handling by adding a spinlock to protect > > > > > > > against multiple writers and converts the list to be protected by RCU > > > > > > > too, so that we don't have a lock inverstion issue at > > > > > > > sctp_addr_wq_timeout_handler(). > > > > > > > > > > > > > > And as this list now uses RCU, we cannot do such backup and restore > > > > > > > while copying descendant data anymore as readers may be traversing the > > > > > > > list meanwhile. We fix this by simply ignoring/not copying those fields, > > > > > > > placed at the end of struct sctp_sock, so we can just ignore it together > > > > > > > with struct ipv6_pinfo data. For that we create sctp_copy_descendant() > > > > > > > so we don't clutter inet_sk_copy_descendant() with SCTP info. > > > > > > > > > > > > > > Issue was found with a test application that kept flipping sysctl > > > > > > > default_auto_asconf on and off. > > > > > > > > > > > > > > Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).") > > > > > > > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx> > > > > > > > --- > > > > > > > include/net/netns/sctp.h | 6 +++++- > > > > > > > include/net/sctp/structs.h | 2 ++ > > > > > > > net/sctp/protocol.c | 6 +++++- > > > > > > > net/sctp/socket.c | 39 ++++++++++++++++++++++++++------------- > > > > > > > 4 files changed, 38 insertions(+), 15 deletions(-) > > > > > > > > > > > > > > diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h > > > > > > > index 3573a81815ad9e0efb6ceb721eb066d3726419f0..e080bebb3147af39c8275261f57018eb01e917b0 100644 > > > > > > > --- a/include/net/netns/sctp.h > > > > > > > +++ b/include/net/netns/sctp.h > > > > > > > @@ -30,12 +30,15 @@ struct netns_sctp { > > > > > > > struct list_head local_addr_list; > > > > > > > struct list_head addr_waitq; > > > > > > > struct timer_list addr_wq_timer; > > > > > > > - struct list_head auto_asconf_splist; > > > > > > > + struct list_head __rcu auto_asconf_splist; > > > > > > You should use the addr_wq_lock here instead of creating a new lock, as thats > > > > > > already used to protect most accesses to the list you are concerned about. > > > > > > > > > > Ok, that works too. > > > > > > > > > > > Though truthfully, that shouldn't be necessecary. The list in question is only > > > > > > read in one location and only written in one location. You can likely just > > > > > > rcu-ify, as the write side is in process context and protected by lock_sock. > > > > > > > > > > It should, it's not protected by lock_sock as this list resides in > > > > > netns_sctp structure, which lock_sock doesn't cover. Write side is in > > > > > process context yes, but this list is written in sctp_init_sock(), > > > > > sctp_destroy_sock() and sctp_setsockopt_auto_asconf(), so one could > > > > > trigger this by either creating/destroying sockets if > > > > > default_auto_asconf=1 or just by creating a bunch of sockets and > > > > > flipping asconf via setsockopt (or a combination of these operations). > > > > > (I'll point this out in the changelog) > > > > > > > > Hmm.. by reusing addr_wq_lock we don't need to rcu-ify the list, as the > > > > reader is inside that lock too, so I can just protect auto_asconf_splist > > > > writers with addr_wq_lock. > > > > > > > > Nice, thanks Neil. > > > > > > Cannot really do that.. as that creates a lock inversion between > > > sctp_destroy_sock() (which already holds lock_sock) and > > > sctp_addr_wq_timeout_handler(), which first grabs addr_wq_lock and then > > > locks socket by socket. > > > > > > Due to that, I'm afraid reusing this lock is not possible, and we should > > > stick with the patch.. what do you think? (though I have to fix the nits > > > in there) > > > > > I don't think thats accurate. You are correct in that the the locks are taken > > in opposing order, which would imply a lock inversion that could result in > > deadlock, but we can avoid that by deferring the asconf list removal until after > > sk_common_release and unlock_sock_bh is called in sctp_close. That will make > > the lock ordering consistent. Alternatively, we can pre-emptively take the > > asconf_lock in sctp_close before locking the socket. > > For your first approach, deferring the asconf list removal, we can only > do that reliably via some work queue, because we initialize asconf stuff > on sctp_init_sock() and it should be de-initialized on its counterpart, > sctp_destroy_sock(), as we have code like: > > (same for ipv4) > sctp_v6_create_accept_sk() > { > ... > if (newsk->sk_prot->init(newsk)) { > sk_common_release(newsk); > newsk = NULL; > } > ... > } > > and at inet6_create() too: > if (sk->sk_prot->init) { > err = sk->sk_prot->init(sk); > if (err) > sk_common_release(sk); > } > > Or we (kind of) abuse of knowing that sctp_init_sock() cannot fail after > initializing asconf and move asconf stuff from sctp_destroy_sock() to > sctp_close(). AFAICT it could be enough, I'm just not a big fan of not > having that similarity. > > If we try to lock addr_wq_lock early in sctp_close(), > sctp_destroy_sock() would be unprotected on above situations, but if we > know that sctp_init_sock() won't fail after initilizating asconf, it > wouldn't be a problem... > > > I'd really rather avoid creating an additional lock here if we don't have to > > I understand. I'm just not seeing another way out so far... I'll keep > trying, but please I'm all ears to ideas ;) > > Marcelo rcu-ify auto_asconf_splist, then just use the addr_wq_lock to protect the list modification sites, as those are all handled at locations that already hold a socket lock. sctp_addr_wq_timeout_handler is a read-only function for auto_asconf_splist, and so the spin_lock_bh there can be replaced with an rcu_read_lock_bh call, breaking the lock inversion Neil -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html