On Fri, Feb 21, 2025 at 12:15:40PM +0100, Luis Chamberlain wrote: > On Wed, Feb 19, 2025 at 02:17:48PM -0600, Lucas De Marchi wrote: > > On Tue, Jan 28, 2025 at 12:57:05PM -0800, Luis Chamberlain wrote: > > > On Wed, Jan 22, 2025 at 09:02:19AM -0800, Alexei Starovoitov wrote: > > > > On Wed, Jan 22, 2025 at 5:12 AM Daniel Gomez <da.gomez@xxxxxxxxxxx> wrote: > > > > > > > > > > Add support for a module error injection tool. The tool > > > > > can inject errors in the annotated module kernel functions > > > > > such as complete_formation(), do_init_module() and > > > > > module_enable_rodata_after_init(). Module name and module function are > > > > > required parameters to have control over the error injection. > > > > > > > > > > Example: Inject error -22 to module_enable_rodata_ro_after_init for > > > > > brd module: > > > > > > > > > > sudo moderr --modname=brd --modfunc=module_enable_rodata_ro_after_init \ > > > > > --error=-22 --trace > > > > > Monitoring module error injection... Hit Ctrl-C to end. > > > > > MODULE ERROR FUNCTION > > > > > brd -22 module_enable_rodata_after_init() > > > > > > > > > > Kernel messages: > > > > > [ 89.463690] brd: module loaded > > > > > [ 89.463855] brd: module_enable_rodata_ro_after_init() returned -22, > > > > > ro_after_init data might still be writable > > > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@xxxxxxxxxxx> > > > > > --- > > > > > tools/bpf/Makefile | 13 ++- > > > > > tools/bpf/moderr/.gitignore | 2 + > > > > > tools/bpf/moderr/Makefile | 95 +++++++++++++++++ > > > > > tools/bpf/moderr/moderr.bpf.c | 127 +++++++++++++++++++++++ > > > > > tools/bpf/moderr/moderr.c | 236 ++++++++++++++++++++++++++++++++++++++++++ > > > > > tools/bpf/moderr/moderr.h | 40 +++++++ > > > > > 6 files changed, 510 insertions(+), 3 deletions(-) > > > > > > > > The tool looks useful, but we don't add tools to the kernel repo. > > > > It has to stay out of tree. > > > > > > For selftests we do add random tools. > > > > > > > The value of error injection is not clear to me. > > > > > > It is of great value, since it deals with corner cases which are > > > otherwise hard to reproduce in places which a real error can be > > > catostrophic. > > > > > > > Other places in the kernel use it to test paths in the kernel > > > > that are difficult to do otherwise. > > > > > > Right. > > > > > > > These 3 functions don't seem to be in this category. > > > > > > That's the key here we should focus on. The problem is when a maintainer > > > *does* agree that adding an error injection entry is useful for testing, > > > and we have a developer willing to do the work to help test / validate > > > it. In this case, this error case is rare but we do want to strive to > > > test this as we ramp up and extend our modules selftests. > > > > > > Then there is the aspect of how to mitigate how instrusive code changes > > > to allow error injection are. In 2021 we evaluated the prospect of error > > > injection in-kernel long ago for other areas like the block layer for > > > add_disk() failures [0] but the minimal interface to enable this from > > > userspace with debugfs was considered just too intrusive. > > > > > > This effort tried to evaluate what this could look like with eBPF to > > > mitigate the required in-kernel code, and I believe the light weight > > > nature of it by just requiring a sprinkle with ALLOW_ERROR_INJECTION() > > > suffices to my taste. > > > > > > So, perhaps the tools aspect can just go in: > > > > > > tools/testing/selftests/module/ > > > > but why would it be module-specific? > > Gotta start somewhere. > > > Based on its current implementation > > and discussion about inject.py it seems to be generic enough to be > > useful to test any function annotated with ALLOW_ERROR_INJECTION(). > > > > As xe driver maintainer, it may be interesting to use such a tool: > > > > $ git grep ALLOW_ERROR_INJECT -- drivers/gpu/drm/xe | wc -l 23 > > > > How does this approach compare to writing the function name on debugfs > > (the current approach in xe's testsuite)? > > > > fail_function @ https://docs.kernel.org/fault-injection/fault-injection.html#fault-injection-capabilities-infrastructure > > https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/xe_fault_injection.c?ref_type=heads#L108 > > > > If you decide to have the tool to live somewhere else, then kmod repo > > could be a candidate. > > Would we install this upon install target? > > Danny can decide on this :) > > > Although I think having it in kernel tree is > > simpler maintenance-wise. > > I think we have at least two users upstream who can make use of it. If > we end up going through tools/testing/selftests/module/ first, can't > you make use of it later? What are the features in debugfs required to be useful for xe that we can port to an eBPF version? I see from the link provided the use of probability, interval, times and space but these are configured to allways trigger the error. Is that right? > > Luis