Re: [PATCH 2/5] mm/slub: restrict sysfs validation to debug caches and make it safe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/14/22 16:39, Hyeonggon Yoo wrote:
> On Fri, Aug 12, 2022 at 11:14:23AM +0200, Vlastimil Babka wrote:
>> Rongwei Wang reports [1] that cache validation triggered by writing to
>> /sys/kernel/slab/<cache>/validate is racy against normal cache
>> operations (e.g. freeing) in a way that can cause false positive
>> inconsistency reports for caches with debugging enabled. The problem is
>> that debugging actions that mark object free or active and actual
>> freelist operations are not atomic, and the validation can see an
>> inconsistent state.
>> 
>> For caches that do or don't have debugging enabled, additional races
>> involving n->nr_slabs are possible that result in false reports of wrong
>> slab counts.
>> 
>> This patch attempts to solve these issues while not adding overhead to
>> normal (especially fastpath) operations for caches that do not have
>> debugging enabled. Such overhead would not be justified to make possible
>> userspace-triggered validation safe. Instead, disable the validation for
>> caches that don't have debugging enabled and make their sysfs validate
>> handler return -EINVAL.
>> 
>> For caches that do have debugging enabled, we can instead extend the
>> existing approach of not using percpu freelists to force all alloc/free
>> perations to the slow paths where debugging flags is checked and acted
>> upon. There can adjust the debug-specific paths to increase n->list_lock
>> coverage against concurrent validation as necessary.
> 
> s/perations/operations

OK

>> @@ -1604,9 +1601,9 @@ static inline
>>  void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
>>  
>>  static inline int alloc_debug_processing(struct kmem_cache *s,
>> -	struct slab *slab, void *object, unsigned long addr) { return 0; }
>> +	struct slab *slab, void *object) { return 0; }
>>  
>> -static inline int free_debug_processing(
>> +static inline void free_debug_processing(
>>  	struct kmem_cache *s, struct slab *slab,
>>  	void *head, void *tail, int bulk_cnt,
>>  	unsigned long addr) { return 0; }
> 
> IIRC As reported by bot on earlier patch, void function
> should not return 0;

OK

>> +/*
>> + * Called only for kmem_cache_debug() caches to allocate from a freshly
>> + * allocated slab. Allocate a single object instead of whole freelist
>> + * and put the slab to the partial (or full) list.
>> + */
>> +static void *alloc_single_from_new_slab(struct kmem_cache *s,
>> +					struct slab *slab)
>> +{
>> +	int nid = slab_nid(slab);
>> +	struct kmem_cache_node *n = get_node(s, nid);
>> +	unsigned long flags;
>> +	void *object;
>> +
>> +	spin_lock_irqsave(&n->list_lock, flags);
>> +
>> +	object = slab->freelist;
>> +	slab->freelist = get_freepointer(s, object);
>> +	slab->inuse = 1;
>> +
>> +	if (!alloc_debug_processing(s, slab, object)) {
>> +		/*
>> +		 * It's not really expected that this would fail on a
>> +		 * freshly allocated slab, but a concurrent memory
>> +		 * corruption in theory could cause that.
>> +		 */
>> +		spin_unlock_irqrestore(&n->list_lock, flags);
>> +		return NULL;
>> +	}
>> +
> 
> Nit: spin_lock_irqsave() can be here as freshly allocated
> slab has no other reference.

Right.

>> +out:
>> +	if (checks_ok) {
>> +		void *prior = slab->freelist;
>> +
>> +		/* Perform the actual freeing while we still hold the locks */
>> +		slab->inuse -= cnt;
>> +		set_freepointer(s, tail, prior);
>> +		slab->freelist = head;
>> +
>> +		/* Do we need to remove the slab from full or partial list? */
>> +		if (!prior) {
>> +			remove_full(s, n, slab);
>> +		} else if (slab->inuse == 0) {
>> +			remove_partial(n, slab);
>> +			stat(s, FREE_REMOVE_PARTIAL);
>> +		}
>> +
>> +		/* Do we need to discard the slab or add to partial list? */
>> +		if (slab->inuse == 0) {
>> +			slab_free = slab;
>> +		} else if (!prior) {
>> +			add_partial(n, slab, DEACTIVATE_TO_TAIL);
>> +			stat(s, FREE_ADD_PARTIAL);
>> +		}
>> +	}
>> +
>> +	if (slab_free) {
>> +		/*
>> +		 * Update the counters while still holding n->list_lock to
>> +		 * prevent spurious validation warnings
>> +		 */
>> +		dec_slabs_node(s, slab_nid(slab_free), slab_free->objects);
>> +	}
> 
> This looks good but maybe kmem_cache_shrink() can lead to
> spurious validation warnings?

Good catch, I'll fix that too.

> 
> Otherwise looks good to me!
> 

Thanks!




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux