Re: [PATCH v2] mm: Make failslab writable again

Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> · Thu, 29 Sep 2022 00:21:10 +0900

On Tue, Sep 27, 2022 at 10:44:20AM +0300, Alexander Atanasov wrote:
> Hello,
> 
> On 27.09.22 3:49, Hyeonggon Yoo wrote:
> > On Fri, Sep 23, 2022 at 10:34:28AM +0300, Alexander Atanasov wrote:
> > > Hello,
> > > 
> > > On 21.09.22 14:30, Hyeonggon Yoo wrote:
> > > > On Tue, Sep 20, 2022 at 03:11:11PM +0300, Alexander Atanasov wrote:
> > > > > In (060807f841ac mm, slub: make remaining slub_debug related attributes
> > > > > read-only) failslab was made read-only.
> > > > > I think it became a collateral victim to the two other options for which
> > > > > the reasons are perfectly valid.
> > > > > Here is why:
> > > > >    - sanity_checks and trace are slab internal debug options,
> > > > >      failslab is used for fault injection.
> > > > >    - for fault injections, which by presumption are random, it
> > > > >      does not matter if it is not set atomically. And you need to
> > > > >      set atleast one more option to trigger fault injection.
> > > > >    - in a testing scenario you may need to change it at runtime
> > > > >      example: module loading - you test all allocations limited
> > > > >      by the space option. Then you move to test only your module's
> > > > >      own slabs.
> > > > >    - when set by command line flags it effectively disables all
> > > > >      cache merges.
> > > > 
> > > > Maybe we can make failslab= boot parameter to consider cache filtering?
> > > > 
> > > > With that, just pass something like this:
> > > > 	failslab=X,X,X,X,cache_filter slub_debug=A,<cache-name>>
> > > 
> > > > Users should pass slub_debug=A,<cache-name> anyway to prevent cache merging.
> > > 
> > > It will be good to have this in case you want to test cache that is used
> > > early. But why push something to command line option only when it can be
> > > changed at runtime?
> > 
> > Hmm okay. I'm not against changing it writable. (it looks okay to me.)
> 
> Okay. Good to know that.
> 
> > Just wanted to understand your use case!
> > Can you please elaborate why booting with slub_debug=A,<your cache name>
> > and enabling cache_filter after boot does not work?
> 
> I didn't say it does not work - it does work but requires reboot. You may
> want to test variations of caches for example. Cache A, Cache B ... C and so
> on one by one. Reboots might be fast these days with VMs but you may not be
> able to test everything in a VM. And ... reboots used to be the signature
> move of one Other OS.

Thank you for elaboration!
Makes sense.

> 
> > Or is it trying to changnig these steps,
> > 
> > FROM
> > 	1. booting with slub_debug=A,<cache name>
> > 	2. write to cache_filter to enable cache filtering
> > 	3. setup probability, interval, times, size
> > 
> > TO
> > 
> > 	1. write to failslab attribute of <cache name> (may fail it has alias)
> > 	2. write to cache_filter to enable cache filtering
> > 	3. setup probability, interval, times, size
> > ?
> > 
> > as you may know, SLAB_FAILSLAB does nothing whens
> > cache_filter is disabled, and you should pass slub_debug=A,<cache name> anyway
> 
> Okay , i think there awaits another problem:
> bool __should_failslab(struct kmem_cache *s, gfp_t gfpflags)
> {
> ...
> 
>         if (failslab.cache_filter && !(s->flags & SLAB_FAILSLAB))
>                 return false;
> ...
> 	return should_fail(&failslab.attr, s->object_size);
> }
> 
> So if you do not have cache_filter set ... you go to should_fail for all
> slabs.

Yes.

> I've been hit by that and spend a lot of time trying to understand why i got
> crashes at random places. And the reason was that i read an old
> documentation that said cache_filter is writable and i blindly wrote 1 to
> it.
>
> If the intent is to only work with cache filter set - then i will update
> the patch to do so.

You mean to set cache_filter to true when writing to 'failslab',
or when setting SLAB_FAILSLAB slab flag?

I'm not so confident for that because it's implicitly changing.
Maybe more documentation would be proper?

what do you think, Vlastimil?

> This is the only place where SLAB_FAILSLAB is explicitly
> tested, other places check it as part of SLAB_NEVER_MERGE.
> 
> But even for all caches it is kind of possible to test with size(space)
> which is in turn useful because you need to figure out how you handle
> failures from external caches - external to your code under test and you
> don't want to keep track for all of them (same goes for too much options in
> command line). 

Yeah, we should be able to inject fault in all caches, or a specific
cache(s).

> > to prevent doing cache merging with <cache name>.
> 
> Or you can pass SLAB_FAILSLAB from your module when creating the cache to
> prevent merge when under test.

Right. I missed that.

> 
> 
> -- 
> Regards,
> Alexander Atanasov
> 

-- 
Thanks,
Hyeonggon