On Wed, 2020-12-09 at 11:05 +0100, Peter Zijlstra wrote: > > > [ 47.844585] Possible unsafe locking scenario: > > > > [ 47.844586] CPU0 CPU1 > > [ 47.844586] ---- ---- > > [ 47.844587] lock(&h->listening_hash[i].lock); > > [ 47.844588] lock((softirq_ctrl.lock).lock); > > [ 47.844588] lock(&h->listening_hash[i].lock); > > [ 47.844589] lock((softirq_ctrl.lock).lock); > > [ 47.844590] > > *** DEADLOCK *** > > > > > So I've been looking at these local_lock vs lockdep splats for a bit, > and unlike the IRQ inversions as reported here: > > https://lore.kernel.org/linux-usb/20201029174348.omqiwjqy64tebg5z@xxxxxxxxxxxxx/ > > I think the above is an actual real problem (for RT). > > AFAICT the above translates to: > > inet_listen() > lock_sock() > spin_lock_bh(&sk->sk_lock.slock); > acquire(softirq_ctrl); > acquire(&sk->sk_lock.slock); > > inet_csk_listen_start() > sk->sk_prot->hash() := inet_hash() > local_bh_disable() > __inet_hash() > spin_lock(&ilb->lock); > acquire(&ilb->lock); > > ---- > > tcp4_seq_next() > listening_get_next() > spin_lock(&ilb->lock); > acquire(&ilb->lock); > > tcp4_seq_show() > get_tcp4_sock() > sock_i_ino() > read_lock_bh(&sk->sk_callback_lock); > acquire(softirq_ctrl) // <---- whoops > acquire(&sk->sk_callback_lock) > > > Which you can run in two tasks on the same CPU (and thus get the same > softirq_ctrl local_lock), and deadlock. > > By holding softirq_ctrl we serialize against softirq-context > (in-softirq) but that isn't relevant here, since neither context is > that. > > On !RT there isn't a problem because softirq_ctrl isn't an actual lock, > but the moment that turns into a real lock (like on RT) you're up a > creek. > > In general we have the rule that as long as a lock is only ever used > from task context (like the above ilb->lock, afaict) then it doesn't > matter if you also take it with (soft)irqs disabled or not. But this > softirq scheme breaks that. If you ever take a lock with BH disabled, > you must now always take it with BH disabled, otherwise you risk > deadlocks against the softirq_ctrl lock. > > Or am I missing something obvious (again) ? Sebastian fixed this via... >From 0fe43be6c32e05d0dd692069d41a40c5453a2195 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> Date: Mon, 12 Oct 2020 17:33:54 +0200 Subject: tcp: Remove superfluous BH-disable around listening_hash Commit 9652dc2eb9e40 ("tcp: relax listening_hash operations") removed the need to disable bottom half while acquiring listening_hash.lock. There are still two callers left which disable bottom half before the lock is acquired. Drop local_bh_disable() around __inet_hash() which acquires listening_hash->lock, invoke inet_ehash_nolisten() with disabled BH. inet_unhash() conditionally acquires listening_hash->lock. Reported-by: Mike Galbraith <efault@xxxxxx> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> Link: https://lore.kernel.org/linux-rt-users/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@xxxxxx/ Signed-off-by: Mike Galbraith <efault@xxxxxx> --- net/ipv4/inet_hashtables.c | 19 ++++++++++++------- net/ipv6/inet6_hashtables.c | 5 +---- 2 files changed, 13 insertions(+), 11 deletions(-) diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 45fb450b4522..5fb95030e7c0 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -635,7 +635,9 @@ int __inet_hash(struct sock *sk, struct sock *osk) int err = 0; if (sk->sk_state != TCP_LISTEN) { + local_bh_disable(); inet_ehash_nolisten(sk, osk, NULL); + local_bh_enable(); return 0; } WARN_ON(!sk_unhashed(sk)); @@ -667,11 +669,8 @@ int inet_hash(struct sock *sk) { int err = 0; - if (sk->sk_state != TCP_CLOSE) { - local_bh_disable(); + if (sk->sk_state != TCP_CLOSE) err = __inet_hash(sk, NULL); - local_bh_enable(); - } return err; } @@ -682,17 +681,20 @@ void inet_unhash(struct sock *sk) struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; struct inet_listen_hashbucket *ilb = NULL; spinlock_t *lock; + bool state_listen; if (sk_unhashed(sk)) return; if (sk->sk_state == TCP_LISTEN) { + state_listen = true; ilb = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)]; - lock = &ilb->lock; + spin_lock(&ilb->lock); } else { + state_listen = false; lock = inet_ehash_lockp(hashinfo, sk->sk_hash); + spin_lock_bh(lock); } - spin_lock_bh(lock); if (sk_unhashed(sk)) goto unlock; @@ -705,7 +707,10 @@ void inet_unhash(struct sock *sk) __sk_nulls_del_node_init_rcu(sk); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); unlock: - spin_unlock_bh(lock); + if (state_listen) + spin_unlock(&ilb->lock); + else + spin_unlock_bh(lock); } EXPORT_SYMBOL_GPL(inet_unhash); diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 55c290d55605..9bad345cba9a 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -333,11 +333,8 @@ int inet6_hash(struct sock *sk) { int err = 0; - if (sk->sk_state != TCP_CLOSE) { - local_bh_disable(); + if (sk->sk_state != TCP_CLOSE) err = __inet_hash(sk, NULL); - local_bh_enable(); - } return err; } -- 2.29.2