Re: [PATCH 2/2] moderr: add module error injection tool

Lucas De Marchi <lucas.demarchi@xxxxxxxxx> · Wed, 19 Feb 2025 14:17:48 -0600

On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote:
On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote:
On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@xxxxxxxxxxx> wrote:
>
> Add support for a module error injection tool. The tool
> can inject errors in the annotated module kernel functions
> such as complete_formation(), do_init_module() and
> module_enable_rodata_after_init(). Module name and module function are
> required parameters to have control over the error injection.
>
> Example: Inject error -22 to module_enable_rodata_ro_after_init for
> brd module:
>
> sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \
> --error=-22 --trace
> Monitoring module error injection... Hit Ctrl-C to end.
> MODULE     ERROR FUNCTION
> brd        -22   module_enable_rodata_after_init()
>
> Kernel messages:
> [   89.463690] brd: module loaded
> [   89.463855] brd: module_enable_rodata_ro_after_init() returned -22,
> ro_after_init data might still be writable
>
> Signed-off-by: Daniel Gomez <da.gomez@xxxxxxxxxxx>
> ---
>  tools/bpf/Makefile            |  13 ++-
>  tools/bpf/moderr/.gitignore   |   2 +
>  tools/bpf/moderr/Makefile     |  95 +++++++++++++++++
>  tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++
>  tools/bpf/moderr/moderr.c     | 236 ++++++++++++++++++++++++++++++++++++++++++
>  tools/bpf/moderr/moderr.h     |  40 +++++++
>  6 files changed, 510 insertions(+), 3 deletions(-)

The tool looks useful, but we don't add tools to the kernel repo.
It has to stay out of tree.

For selftests we do add random tools.

The value of error injection is not clear to me.

It is of great value, since it deals with corner cases which are
otherwise hard to reproduce in places which a real error can be
catostrophic.

Other places in the kernel use it to test paths in the kernel
that are difficult to do otherwise.

Right.

These 3 functions don't seem to be in this category.

That's the key here we should focus on. The problem is when a maintainer
*does* agree that adding an error injection entry is useful for testing,
and we have a developer willing to do the work to help test / validate
it. In this case, this error case is rare but we do want to strive to
test this as we ramp up and extend our modules selftests.

Then there is the aspect of how to mitigate how instrusive code changes
to allow error injection are. In 2021 we evaluated the prospect of error
injection in-kernel long ago for other areas like the block layer for
add_disk() failures [0] but the minimal interface to enable this from
userspace with debugfs was considered just too intrusive.

This effort tried to evaluate what this could look like with eBPF to
mitigate the required in-kernel code, and I believe the light weight
nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION()
suffices to my taste.

So, perhaps the tools aspect can just go in:

tools/testing/selftests/module/

but why would it be module-specific? Based on its current implementation
and discussion about inject.py it seems to be generic enough to be
useful to test any function annotated with ALLOW_ERROR_INJECTION().

As xe driver maintainer, it may be interesting to use such a tool:

	$ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l  
	23

How does this approach compare to writing the function name on debugfs
(the current approach in xe's testsuite)?

	fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure
	https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108

If you decide to have the tool to live somewhere else, then kmod repo
could be a candidate. Although I think having it in kernel tree is
simpler maintenance-wise.

Lucas De Marchi

[0] https://www.spinics.net/lists/linux-block/msg68159.html

 Luis