Hi Waiman, Thanks for your effort on trying to fix this. On Wed, Jan 18, 2023 at 11:01:11PM -0500, Waiman Long wrote: > @@ -567,7 +574,9 @@ static void __remove_object(struct kmemleak_object *object) > rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ? > &object_phys_tree_root : > &object_tree_root); > - list_del_rcu(&object->object_list); > + if (!(object->del_state & DELSTATE_NO_DELETE)) > + list_del_rcu(&object->object_list); > + object->del_state |= DELSTATE_REMOVED; > } So IIUC, this prevents the current object being scanned from being removed from the list during the kmemleak_cond_resched() call. > /* > @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size, > object->count = 0; /* white color initially */ > object->jiffies = jiffies; > object->checksum = 0; > + object->del_state = 0; > > /* task information */ > if (in_hardirq()) { > @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) > if (!get_object(object)) > return; /* Try next object */ > > + raw_spin_lock_irq(&kmemleak_lock); > + if (object->del_state & DELSTATE_REMOVED) > + goto unlock_put; /* Object removed */ > + object->del_state |= DELSTATE_NO_DELETE; > + raw_spin_unlock_irq(&kmemleak_lock); > + > rcu_read_unlock(); > cond_resched(); > rcu_read_lock(); > + > + raw_spin_lock_irq(&kmemleak_lock); > + if (object->del_state & DELSTATE_REMOVED) > + list_del_rcu(&object->object_list); > + object->del_state &= ~DELSTATE_NO_DELETE; > +unlock_put: > + raw_spin_unlock_irq(&kmemleak_lock); > put_object(object); > } I'm not sure this was the only problem. We do have the problem that the current object may be removed from the list, solved above, but another scenario I had in mind is the next object being released during this brief resched period. The RCU relies on object->next->next being valid but, with a brief rcu_read_unlock(), the object->next could be freed, reallocated, so object->next->next invalid. -- Catalin