Re: [PATCH v1 8/8] block: add add_disk() failure injection support

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Wed, 12 May 2021 16:56:39 +0000

On Wed, May 12, 2021 at 05:22:48PM +0200, Hannes Reinecke wrote:
> On 5/12/21 8:46 AM, Luis Chamberlain wrote:
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index d1467658361f..4fccc0fad190 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1917,6 +1917,19 @@ config FAULT_INJECTION_USERCOPY
> >   	  Provides fault-injection capability to inject failures
> >   	  in usercopy functions (copy_from_user(), get_user(), ...).
> > +config FAIL_ADD_DISK
> > +	bool "Fault-injection capability for add_disk() callers"
> > +	depends on FAULT_INJECTION && BLOCK
> > +	help
> > +	  Provide fault-injection capability for the add_disk() block layer
> > +	  call path. This allows the kernel to provide error injection when
> > +	  the add_disk() call is made. You would use something like blktests
> > +	  test against this or just load the null_blk driver. This only
> > +	  enables the error injection functionality. To use it you must
> > +	  configure which path you want to trigger on error on using debugfs
> > +	  under /sys/kernel/debug/block/config_fail_add_disk/. By default
> > +	  all of these are disabled.
> > +
> >   config FAIL_MAKE_REQUEST
> >   	bool "Fault-injection capability for disk IO"
> >   	depends on FAULT_INJECTION && BLOCK
> > 
> 
> Hmm. Not a fan of this approach.
> 
> Having to have a separate piece of code just to test individual functions,
> _and_ having to place hooks in the code to _simulate_ a failure seems rather
> fragile to me.
> 
> I would have vastly preferred if we could to this via generic tools like
> ebpf or livepatching.

Agreed. Now, we would then need a place to dump these as well. I guess
blktets would be it for the block layer... and fstests for fs. If done
with livepatching it would take a long time, consider the time added for
probing modules just for a new fault injection for a few routines...
how many modules.. and time.

ebpf maybe. Someone is going to have to try it.

Another possibility is kunit, and I think the tests would be faster.
However maintained boiler place would still be needed.

> Also I'm worried that this approach doesn't really scale; taken to extremes
> we would have to add duplicate calls to each and every function for full
> error injection, essentially double the size of the code just on the
> off-chance that someone wants to do error injection.

Indeed. What would be better is to have the ability to get this for
free and programatically enable knobs. Now fault-injection has some
ability to fail on functions dynamically but I haven't tested that.
Reason I didn't go with that is we want certain functions to fail but
*only* in certain context, not all the time for every caller. This
approach was safer and specific to the block layer, and in fact
only applicable to the add_disk() path.

> So I'd rather delegate the topic of error injection to a more general
> discussion (LSF springs to mind ...), and then agree on a framework which is
> suitable for every function.

Or we just get cranking and produce proof of concepts to compare and
contrast later. At least I hope this patch and the respective blktests
patches suffice to help demo what we need to test.

  Luis