Re: How to increase rcu_head size without breaking -mm ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/25/22 17:00, Joel Fernandes wrote:
> On Thu, Aug 25, 2022 at 10:14:16AM +0200, Vlastimil Babka wrote:
>> 
>> On 8/25/22 09:50, Vlastimil Babka wrote:
>> > 
>> > On 8/25/22 09:13, Vlastimil Babka wrote:
>> >> On 8/25/22 04:56, Joel Fernandes wrote:
>> >>> Hi,
>> >>> I am trying to add some debug information to rcu_head for some patches, and need
>> >>> your help! At least this used to work some time ago (> 1 year).
>> >> 
>> >> Hmm, got a patch of how it worked back then? Because what I see (in v5.12)
>> >> struct page definition:
>> >> 
>> >> 	struct rcu_head rcu_head;
>> >> 
>> >> is in union with the whole of
>> >> 
>> >> 	struct {        /* slab, slob and slub */ ... }
>> >> 
>> >> which starts with
>> >> 
>> >> 	union { struct list_head slab_list;  ... }
>> >> 	struct kmem_cache *slab_cache; /* not slob */
>> >> 
>> >> and there's e.g. kmem_rcu_free() in mm/slab.c that does
>> >> 
>> >>         page = container_of(head, struct page, rcu_head);
>> >>         cachep = page->slab_cache;
>> >> 
>> >> and very similar thing in mm/slub.c rcu_free_slab() - we assume that the
>> >> rcu_head and slab_cache fields can coexist.
>> >> 
>> >> So this looks silently fragile to me? In case rcu_head becomes larger than
>> >> list_head, e.g. with your buf[4], it would start overlaping the
>> >> page->slab_cache field, no?
>> >> 
>> >> AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the
>> >> overlap impossible, thus the struct slab will grow as rcu_head grows, and
>> >> offsets of the later fields stop matching struct page counterparts, which
>> >> doesn't grow. Hmm.
>> > 
>> > The following has made things compile for both SLUB and SLAB (didn't adjust
>> > SLOB), the idea is to define more explicitly that everything can overlap
>> > with rcu_head except slab_cache. So the structures don't even grow in the
>> > end. Which requires moving slab_cache field around. Which would be fine
>> > except for SLUB and its cmpxchg_double usage, which requires the
>> > freelist+counter fields to be 128bit aligned, which means we can't fit it
>> > all with these constraints. So that's broken with the patch and I don't see
>> > an easy way to fix this.
>> 
>> Ah maybe I found a way. It requires that rcu_head in struct slab would be at
>> different offset in struct page, and we drop the particular SLAB_MATCH
>> check. I think it should be fine as we don't cast struct slab to struct page
>> in the rcu calls/callbacks so it was probably there just for intermediate
>> steps of the struct slab series, and can be dropped now.
>> 
>> So this should allow you to use up to 32 bytes of rcu_head.
> 
> Thanks a lot, Vlastimil! I can confirm it builds, some comments below:
> 
>> ----8<----
>> diff --git a/include/linux/types.h b/include/linux/types.h
>> index ea8cf60a8a79..daf7682a7a3b 100644
>> --- a/include/linux/types.h
>> +++ b/include/linux/types.h
>> @@ -220,6 +220,7 @@ struct ustat {
>>  struct callback_head {
>>  	struct callback_head *next;
>>  	void (*func)(struct callback_head *head);
>> +	char buf[4];
>>  } __attribute__((aligned(sizeof(void *))));
>>  #define rcu_head callback_head
>>  
>> diff --git a/mm/slab.h b/mm/slab.h
>> index 4ec82bec15ec..dd49851f9814 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -12,37 +12,45 @@ struct slab {
>>  #if defined(CONFIG_SLAB)
>>  
>>  	union {
>> -		struct list_head slab_list;
>> +		struct {
>> +			struct list_head slab_list;
>> +			void *freelist;	/* array of free object indexes */
>> +			void *s_mem;	/* first object */
>> +		};
>>  		struct rcu_head rcu_head;
> 
> Does this present any problems for SLAB_TYPESAFE_BY_RCU such as the slab
> being in use before the end of a grace-period? I was concerned about the
> freelist being corrupted by the rcu_head storage, since its now in a union
> with it. If this does not matter, then that's all good.

Good point, I didn't check that. If we're freeing the slab then nothing
should be using it at that point, including the freelist pointer. Only the
actual object storage is guaranteed not to go away or become overwritten for
SLAB_TYPESAFE_BY_RCU caches.

But we need to check the slab freeing functions themselves.
It looks like SLAB is fine, slab_destroy() there takes care to copy the
freelist pointer away before call_rcu() already and then it only works with
the copy. Also nothing should need s_mem either, in case you need more bytes
for the rcu_head debug.

SLUB seems also fine as the freeing doesn't try to look at the freelist even
in case of debugging consistency checks being enabled. It could be a problem
in case of a double-free, but that's already a problem anyway.
In case you need more than the 8 bytes of freelist pointer, then overwriting
the next 'counters' union would be a problem for the consistency checks as
it reads slab->objects from there.
But I guess we could do those checks before call_rcu, so the callback would
then only free the slab page. The state expected by those checks doesn't
depend on the grace period expiring.

>> @@ -66,9 +74,13 @@ struct slab {
>>  #define SLAB_MATCH(pg, sl)						\
>>  	static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
>>  SLAB_MATCH(flags, __page_flags);
>> +#ifdef CONFIG_SLUB
>> +SLAB_MATCH(compound_head, slab_cache);	/* Ensure bit 0 is clear */
>> +#else
>>  SLAB_MATCH(compound_head, slab_list);	/* Ensure bit 0 is clear */
>> +#endif
>>  #ifndef CONFIG_SLOB
>> -SLAB_MATCH(rcu_head, rcu_head);
>> +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */
> 
> I think if it helps, I can make this check as a dependency on a default-off
> debug config option.

It wouldn't help as with this patch, for SLUB the rcu_head would start at
different offset regardless of whether you enable the option to increase its
size.

> Tested-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> 
> thanks,
> 
>  - Joel
> 




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux