Patch "netlink: disable IRQs for netlink_lock_table()" has been added to the 4.4-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Thu, 10 Jun 2021 22:17:14 -0400

This is a note to let you know that I've just added the patch titled

    netlink: disable IRQs for netlink_lock_table()

to the 4.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     netlink-disable-irqs-for-netlink_lock_table.patch
and it can be found in the queue-4.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit a5f453838e843ec7088c6dd7fb72423f036469f0
Author: Johannes Berg <johannes.berg@xxxxxxxxx>
Date:   Mon May 17 16:38:09 2021 +0200

    netlink: disable IRQs for netlink_lock_table()
    
    [ Upstream commit 1d482e666b8e74c7555dbdfbfb77205eeed3ff2d ]
    
    Syzbot reports that in mac80211 we have a potential deadlock
    between our "local->stop_queue_reasons_lock" (spinlock) and
    netlink's nl_table_lock (rwlock). This is because there's at
    least one situation in which we might try to send a netlink
    message with this spinlock held while it is also possible to
    take the spinlock from a hardirq context, resulting in the
    following deadlock scenario reported by lockdep:
    
           CPU0                    CPU1
           ----                    ----
      lock(nl_table_lock);
                                   local_irq_disable();
                                   lock(&local->queue_stop_reason_lock);
                                   lock(nl_table_lock);
      <Interrupt>
        lock(&local->queue_stop_reason_lock);
    
    This seems valid, we can take the queue_stop_reason_lock in
    any kind of context ("CPU0"), and call ieee80211_report_ack_skb()
    with the spinlock held and IRQs disabled ("CPU1") in some
    code path (ieee80211_do_stop() via ieee80211_free_txskb()).
    
    Short of disallowing netlink use in scenarios like these
    (which would be rather complex in mac80211's case due to
    the deep callchain), it seems the only fix for this is to
    disable IRQs while nl_table_lock is held to avoid hitting
    this scenario, this disallows the "CPU0" portion of the
    reported deadlock.
    
    Note that the writer side (netlink_table_grab()) already
    disables IRQs for this lock.
    
    Unfortunately though, this seems like a huge hammer, and
    maybe the whole netlink table locking should be reworked.
    
    Reported-by: syzbot+69ff9dff50dcfe14ddd4@xxxxxxxxxxxxxxxxxxxxxxxxx
    Signed-off-by: Johannes Berg <johannes.berg@xxxxxxxxx>
    Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index cc37a219e11e..c20c41801845 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -436,11 +436,13 @@ void netlink_table_ungrab(void)
 static inline void
 netlink_lock_table(void)
 {
+	unsigned long flags;
+
 	/* read_lock() synchronizes us to netlink_table_grab */
 
-	read_lock(&nl_table_lock);
+	read_lock_irqsave(&nl_table_lock, flags);
 	atomic_inc(&nl_table_users);
-	read_unlock(&nl_table_lock);
+	read_unlock_irqrestore(&nl_table_lock, flags);
 }
 
 static inline void