Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Vlastimil Babka <vbabka@xxxxxxx> · Mon, 17 Jun 2024 23:19:00 +0200

On 6/17/24 7:04 PM, Jason A. Donenfeld wrote:
>>> Vlastimil, this is just checking a boolean (which could be
>>> unlikely()'d), which should have pretty minimal overhead. Is that
>>> alright with you?
>>
>> Well I doubt we can just set and check it without any barriers? The
>> completion of the last pending kfree_rcu() might race with
>> kmem_cache_destroy() in a way that will leave the cache there forever, no?
>> And once we add barriers it becomes a perf issue?
> 
> Hm, yea you might be right about barriers being required. But actually,
> might this point toward a larger problem with no matter what approach,
> polling or event, is chosen? If the current rule is that
> kmem_cache_free() must never race with kmem_cache_destroy(), because

Yes calling alloc/free operations that race with destroy is a bug and we
can't prevent that.

> users have always made diligent use of call_rcu()/rcu_barrier() and

But the issue we are solving here is a bit different - the users are not
buggy, they do kfree_rcu() and then kmem_cache_destroy() and no more
operations on the cache afterwards. We need to ensure that the handling
of kfree_rcu() (which ultimately is basically kmem_cache_free() but
internally to rcu/slub) doesn't race with kmem_cache_destroy().

> such, but now we're going to let those race with each other - either by
> my thing above or by polling - so we're potentially going to get in trouble
> and need some barriers anyway. 

The barrier in the async part of kmem_cache_destroy() should be enough
to make sure all kfree_rcu() have finished before we proceed with the
potentially racy parts of destroying, and we should be able to avoid
changes in kmem_cache_free().

> I think?
> 
> Jason