On Sat, Dec 10, 2022 at 06:00:48PM -0500, Waiman Long wrote: > Commit 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first > object iteration loop of kmemleak_scan()") fixes soft lockup problem > in kmemleak_scan() by periodically doing a cond_resched(). It does > take a reference of the current object before doing it. Unfortunately, > if the object has been deleted from the object_list, the next object > pointed to by its next pointer may no longer be valid after coming > back from cond_resched(). This can result in use-after-free and other > nasty problem. Ah, kmemleak_cond_resched() releases the rcu lock, so using list_for_each_entry_rcu() doesn't help. > diff --git a/mm/kmemleak.c b/mm/kmemleak.c > index 8c44f70ed457..d3a8fa4e3af3 100644 > --- a/mm/kmemleak.c > +++ b/mm/kmemleak.c > @@ -1465,15 +1465,26 @@ static void scan_gray_list(void) > * that the given object won't go away without RCU read lock by performing a > * get_object() if necessaary. > */ > -static void kmemleak_cond_resched(struct kmemleak_object *object) > +static void kmemleak_cond_resched(struct kmemleak_object **pobject) > { > - if (!get_object(object)) > + struct kmemleak_object *obj = *pobject; > + > + if (!(obj->flags & OBJECT_ALLOCATED) || !get_object(obj)) > return; /* Try next object */ I don't think we can rely on obj->flags without holding obj->lock. We do have a few WARN_ON() checks without the lock but in all other places the lock should be held. Another potential issue with re-scanning is that the loop may never complete if it always goes from the beginning. Yet another problem with restarting is that we may count references to an object multiple times and get more false negatives. I'd keep the OBJECT_ALLOCATED logic in the main kmemleak_scan() loop and retake the object->lock if cond_resched() was called (kmemleak_need_resched() returning true), check if it was freed and restart the loop. We could add a new OBJECT_SCANNED flag so that we skip such objects if we restarted the loop. The flag is reset during list preparation. I wonder whether we actually need the cond_resched() in the first loop. It does take a lot of locks but it doesn't scan the objects. I had a patch around to remove the fine-grained locking in favour of the big kmemleak_lock, it would make this loop faster (not sure what happened to that patch, I need to dig it out). -- Catalin