Patch "netfilter: conntrack: fix crash due to confirmed bit load reordering" has been added to the 5.18-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    netfilter: conntrack: fix crash due to confirmed bit load reordering

to the 5.18-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     netfilter-conntrack-fix-crash-due-to-confirmed-bit-l.patch
and it can be found in the queue-5.18 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 5a24db7224caa08286e5971ebb35ec019e00c06a
Author: Florian Westphal <fw@xxxxxxxxx>
Date:   Wed Jul 6 16:50:04 2022 +0200

    netfilter: conntrack: fix crash due to confirmed bit load reordering
    
    [ Upstream commit 0ed8f619b412b52360ccdfaf997223ccd9319569 ]
    
    Kajetan Puchalski reports crash on ARM, with backtrace of:
    
    __nf_ct_delete_from_lists
    nf_ct_delete
    early_drop
    __nf_conntrack_alloc
    
    Unlike atomic_inc_not_zero, refcount_inc_not_zero is not a full barrier.
    conntrack uses SLAB_TYPESAFE_BY_RCU, i.e. it is possible that a 'newly'
    allocated object is still in use on another CPU:
    
    CPU1                                            CPU2
                                                    encounter 'ct' during hlist walk
     delete_from_lists
     refcount drops to 0
     kmem_cache_free(ct);
     __nf_conntrack_alloc() // returns same object
                                                    refcount_inc_not_zero(ct); /* might fail */
    
                                                    /* If set, ct is public/in the hash table */
                                                    test_bit(IPS_CONFIRMED_BIT, &ct->status);
    
    In case CPU1 already set refcount back to 1, refcount_inc_not_zero()
    will succeed.
    
    The expected possibilities for a CPU that obtained the object 'ct'
    (but no reference so far) are:
    
    1. refcount_inc_not_zero() fails.  CPU2 ignores the object and moves to
       the next entry in the list.  This happens for objects that are about
       to be free'd, that have been free'd, or that have been reallocated
       by __nf_conntrack_alloc(), but where the refcount has not been
       increased back to 1 yet.
    
    2. refcount_inc_not_zero() succeeds. CPU2 checks the CONFIRMED bit
       in ct->status.  If set, the object is public/in the table.
    
       If not, the object must be skipped; CPU2 calls nf_ct_put() to
       un-do the refcount increment and moves to the next object.
    
    Parallel deletion from the hlists is prevented by a
    'test_and_set_bit(IPS_DYING_BIT, &ct->status);' check, i.e. only one
    cpu will do the unlink, the other one will only drop its reference count.
    
    Because refcount_inc_not_zero is not a full barrier, CPU2 may try to
    delete an object that is not on any list:
    
    1. refcount_inc_not_zero() successful (refcount inited to 1 on other CPU)
    2. CONFIRMED test also successful (load was reordered or zeroing
       of ct->status not yet visible)
    3. delete_from_lists unlinks entry not on the hlist, because
       IPS_DYING_BIT is 0 (already cleared).
    
    2) is already wrong: CPU2 will handle a partially initited object
    that is supposed to be private to CPU1.
    
    Add needed barriers when refcount_inc_not_zero() is successful.
    
    It also inserts a smp_wmb() before the refcount is set to 1 during
    allocation.
    
    Because other CPU might still see the object, refcount_set(1)
    "resurrects" it, so we need to make sure that other CPUs will also observe
    the right content.  In particular, the CONFIRMED bit test must only pass
    once the object is fully initialised and either in the hash or about to be
    inserted (with locks held to delay possible unlink from early_drop or
    gc worker).
    
    I did not change flow_offload_alloc(), as far as I can see it should call
    refcount_inc(), not refcount_inc_not_zero(): the ct object is attached to
    the skb so its refcount should be >= 1 in all cases.
    
    v2: prefer smp_acquire__after_ctrl_dep to smp_rmb (Will Deacon).
    v3: keep smp_acquire__after_ctrl_dep close to refcount_inc_not_zero call
        add comment in nf_conntrack_netlink, no control dependency there
        due to locks.
    
    Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
    Link: https://lore.kernel.org/all/Yr7WTfd6AVTQkLjI@xxxxxxxxxxxxxxxxxxxxxxxxxx/
    Reported-by: Kajetan Puchalski <kajetan.puchalski@xxxxxxx>
    Diagnosed-by: Will Deacon <will@xxxxxxxxxx>
    Fixes: 719774377622 ("netfilter: conntrack: convert to refcount_t api")
    Signed-off-by: Florian Westphal <fw@xxxxxxxxx>
    Acked-by: Will Deacon <will@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 9010b6e5a072..5a85735512ce 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -764,6 +764,9 @@ static void nf_ct_gc_expired(struct nf_conn *ct)
 	if (!refcount_inc_not_zero(&ct->ct_general.use))
 		return;
 
+	/* load ->status after refcount increase */
+	smp_acquire__after_ctrl_dep();
+
 	if (nf_ct_should_gc(ct))
 		nf_ct_kill(ct);
 
@@ -830,6 +833,9 @@ __nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone,
 		 */
 		ct = nf_ct_tuplehash_to_ctrack(h);
 		if (likely(refcount_inc_not_zero(&ct->ct_general.use))) {
+			/* re-check key after refcount */
+			smp_acquire__after_ctrl_dep();
+
 			if (likely(nf_ct_key_equal(h, tuple, zone, net)))
 				goto found;
 
@@ -1369,6 +1375,9 @@ static unsigned int early_drop_list(struct net *net,
 		if (!refcount_inc_not_zero(&tmp->ct_general.use))
 			continue;
 
+		/* load ->ct_net and ->status after refcount increase */
+		smp_acquire__after_ctrl_dep();
+
 		/* kill only if still in same netns -- might have moved due to
 		 * SLAB_TYPESAFE_BY_RCU rules.
 		 *
@@ -1518,6 +1527,9 @@ static void gc_worker(struct work_struct *work)
 			if (!refcount_inc_not_zero(&tmp->ct_general.use))
 				continue;
 
+			/* load ->status after refcount increase */
+			smp_acquire__after_ctrl_dep();
+
 			if (gc_worker_skip_ct(tmp)) {
 				nf_ct_put(tmp);
 				continue;
@@ -1749,6 +1761,16 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 	if (!exp)
 		__nf_ct_try_assign_helper(ct, tmpl, GFP_ATOMIC);
 
+	/* Other CPU might have obtained a pointer to this object before it was
+	 * released.  Because refcount is 0, refcount_inc_not_zero() will fail.
+	 *
+	 * After refcount_set(1) it will succeed; ensure that zeroing of
+	 * ct->status and the correct ct->net pointer are visible; else other
+	 * core might observe CONFIRMED bit which means the entry is valid and
+	 * in the hash table, but its not (anymore).
+	 */
+	smp_wmb();
+
 	/* Now it is inserted into the unconfirmed list, set refcount to 1. */
 	refcount_set(&ct->ct_general.use, 1);
 	nf_ct_add_to_unconfirmed_list(ct);
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 2e9c8183e4a2..431e005ff14d 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1203,6 +1203,7 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
 					   hnnode) {
 			ct = nf_ct_tuplehash_to_ctrack(h);
 			if (nf_ct_is_expired(ct)) {
+				/* need to defer nf_ct_kill() until lock is released */
 				if (i < ARRAY_SIZE(nf_ct_evict) &&
 				    refcount_inc_not_zero(&ct->ct_general.use))
 					nf_ct_evict[i++] = ct;
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 55aa55b252b2..48812dda273b 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -306,6 +306,9 @@ static int ct_seq_show(struct seq_file *s, void *v)
 	if (unlikely(!refcount_inc_not_zero(&ct->ct_general.use)))
 		return 0;
 
+	/* load ->status after refcount increase */
+	smp_acquire__after_ctrl_dep();
+
 	if (nf_ct_should_gc(ct)) {
 		nf_ct_kill(ct);
 		goto release;



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux