Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free

Rongwei Wang <rongwei.wang@xxxxxxxxxxxxxxxxx> · Tue, 31 May 2022 16:50:41 +0800

On 5/29/22 7:37 PM, Hyeonggon Yoo wrote:
On Sun, May 29, 2022 at 04:15:33PM +0800, Rongwei Wang wrote:
In use cases where allocating and freeing slab frequently, some
error messages, such as "Left Redzone overwritten", "First byte
0xbb instead of 0xcc" would be printed when validating slabs.
That's because an object has been filled with SLAB_RED_INACTIVE,
but has not been added to slab's freelist. And between these
two states, the behaviour of validating slab is likely to occur.

Actually, it doesn't mean the slab can not work stably. But, these
confusing messages will disturb slab debugging more or less.

Signed-off-by: Rongwei Wang <rongwei.wang@xxxxxxxxxxxxxxxxx>

Have you observed it or it's from code inspection?
Hi, Hyeonggon

I try to build a module to trigger the race:

#define SLUB_KTHREAD_MAX 1
static int do_slub_alloc(void *data)
{
        char *mm = NULL;
        char *mm1 = NULL;
        char *mm2 = NULL;
        char *mm3 = NULL;

        allow_signal(SIGTERM);

        while (1) {
                mm = kmalloc(2048, GFP_KERNEL);
                if (mm)
                        mm[0x100] = 0x21;

                if (mm)
                        kfree(mm);

                mm = NULL;
                if (kthread_should_stop())
                        break;
        }

        return 0;
}

static int __init mini_init(void)
{
        char *mm;
        int i = 0;
        unsigned int index;
        char kth_name[11] = "do_slub_00";

        for (i = 0; i < SLUB_KTHREAD_MAX; i++) {
                kth_name[9] = '0' + i%10;
                kth_name[8] = '0' + i/10;
                slub_thread[i] = kthread_run(do_slub_alloc, NULL, 
kth_name);
        }

        return 0;
}
module_init(mini_init);

And in my system, I add 'slub_debug=UFPZ' to the boot options. Next, the 
error messages will be printed when I test "slabinfo -v" or "echo 1 > 
/sys/kernel/slab/kmalloc-2048/validate".


---
  mm/slub.c | 40 +++++++++++++++++-----------------------
  1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index ed5c2c03a47a..310e56d99116 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1374,15 +1374,12 @@ static noinline int free_debug_processing(
  	void *head, void *tail, int bulk_cnt,
  	unsigned long addr)
  {
-	struct kmem_cache_node *n = get_node(s, slab_nid(slab));
  	void *object = head;
  	int cnt = 0;
-	unsigned long flags, flags2;
+	unsigned long flags;
  	int ret = 0;
  
-	spin_lock_irqsave(&n->list_lock, flags);
-	slab_lock(slab, &flags2);
-
+	slab_lock(slab, &flags);
  	if (s->flags & SLAB_CONSISTENCY_CHECKS) {
  		if (!check_slab(s, slab))
  			goto out;
@@ -1414,8 +1411,7 @@ static noinline int free_debug_processing(
  		slab_err(s, slab, "Bulk freelist count(%d) invalid(%d)\n",
  			 bulk_cnt, cnt);
  
-	slab_unlock(slab, &flags2);
-	spin_unlock_irqrestore(&n->list_lock, flags);
+	slab_unlock(slab, &flags);
  	if (!ret)
  		slab_fix(s, "Object at 0x%p not freed", object);
  	return ret;
@@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
  
  {
  	void *prior;
-	int was_frozen;
+	int was_frozen, to_take_off = 0;
  	struct slab new;
  	unsigned long counters;
  	struct kmem_cache_node *n = NULL;
@@ -3315,15 +3311,19 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
  	if (kfence_free(head))
  		return;
  
+	n = get_node(s, slab_nid(slab));
+	spin_lock_irqsave(&n->list_lock, flags);
+

Oh please don't do this.

SLUB free slowpath can be hit a lot depending on workload.
Thanks, your words remind me. Actually, I put the original in 
free_debug_processing() lock on the outside of it. Looks this change is 
small. Indeed, it will degrade performance more or less.

And do you have other ideas?:)

-wrw

__slab_free() try its best not to take n->list_lock. currently takes n->list_lock
only when the slab need to be taken from list.

Unconditionally taking n->list_lock will degrade performance.

  	if (kmem_cache_debug(s) &&
-	    !free_debug_processing(s, slab, head, tail, cnt, addr))
+	    !free_debug_processing(s, slab, head, tail, cnt, addr)) {
+
+		spin_unlock_irqrestore(&n->list_lock, flags);
  		return;
+	}
  
  	do {
-		if (unlikely(n)) {
-			spin_unlock_irqrestore(&n->list_lock, flags);
-			n = NULL;
-		}
+		if (unlikely(to_take_off))
+			to_take_off = 0;
  		prior = slab->freelist;
  		counters = slab->counters;
  		set_freepointer(s, tail, prior);
@@ -3343,18 +3343,11 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
  				new.frozen = 1;
  
  			} else { /* Needs to be taken off a list */
-
-				n = get_node(s, slab_nid(slab));
  				/*
-				 * Speculatively acquire the list_lock.
  				 * If the cmpxchg does not succeed then we may
-				 * drop the list_lock without any processing.
-				 *
-				 * Otherwise the list_lock will synchronize with
-				 * other processors updating the list of slabs.
+				 * drop this behavior without any processing.
  				 */
-				spin_lock_irqsave(&n->list_lock, flags);
-
+				to_take_off = 1;
  			}
  		}
  
@@ -3363,8 +3356,9 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
  		head, new.counters,
  		"__slab_free"));
  
-	if (likely(!n)) {
+	if (likely(!to_take_off)) {
  
+		spin_unlock_irqrestore(&n->list_lock, flags);
  		if (likely(was_frozen)) {
  			/*
  			 * The list lock was not taken therefore no list

--
2.27.0