On Wed, May 12, 2021 at 05:22:48PM +0200, Hannes Reinecke wrote: > On 5/12/21 8:46 AM, Luis Chamberlain wrote: > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > index d1467658361f..4fccc0fad190 100644 > > --- a/lib/Kconfig.debug > > +++ b/lib/Kconfig.debug > > @@ -1917,6 +1917,19 @@ config FAULT_INJECTION_USERCOPY > > Provides fault-injection capability to inject failures > > in usercopy functions (copy_from_user(), get_user(), ...). > > +config FAIL_ADD_DISK > > + bool "Fault-injection capability for add_disk() callers" > > + depends on FAULT_INJECTION && BLOCK > > + help > > + Provide fault-injection capability for the add_disk() block layer > > + call path. This allows the kernel to provide error injection when > > + the add_disk() call is made. You would use something like blktests > > + test against this or just load the null_blk driver. This only > > + enables the error injection functionality. To use it you must > > + configure which path you want to trigger on error on using debugfs > > + under /sys/kernel/debug/block/config_fail_add_disk/. By default > > + all of these are disabled. > > + > > config FAIL_MAKE_REQUEST > > bool "Fault-injection capability for disk IO" > > depends on FAULT_INJECTION && BLOCK > > > > Hmm. Not a fan of this approach. > > Having to have a separate piece of code just to test individual functions, > _and_ having to place hooks in the code to _simulate_ a failure seems rather > fragile to me. > > I would have vastly preferred if we could to this via generic tools like > ebpf or livepatching. Agreed. Now, we would then need a place to dump these as well. I guess blktets would be it for the block layer... and fstests for fs. If done with livepatching it would take a long time, consider the time added for probing modules just for a new fault injection for a few routines... how many modules.. and time. ebpf maybe. Someone is going to have to try it. Another possibility is kunit, and I think the tests would be faster. However maintained boiler place would still be needed. > Also I'm worried that this approach doesn't really scale; taken to extremes > we would have to add duplicate calls to each and every function for full > error injection, essentially double the size of the code just on the > off-chance that someone wants to do error injection. Indeed. What would be better is to have the ability to get this for free and programatically enable knobs. Now fault-injection has some ability to fail on functions dynamically but I haven't tested that. Reason I didn't go with that is we want certain functions to fail but *only* in certain context, not all the time for every caller. This approach was safer and specific to the block layer, and in fact only applicable to the add_disk() path. > So I'd rather delegate the topic of error injection to a more general > discussion (LSF springs to mind ...), and then agree on a framework which is > suitable for every function. Or we just get cranking and produce proof of concepts to compare and contrast later. At least I hope this patch and the respective blktests patches suffice to help demo what we need to test. Luis