On 9/21/24 23:08, Guenter Roeck wrote: > On 9/21/24 13:40, Vlastimil Babka wrote: >> +CC kunit folks >> >> On 9/20/24 15:35, Guenter Roeck wrote: >>> Hi, >> >> Hi, >> >>> On Wed, Aug 07, 2024 at 12:31:20PM +0200, Vlastimil Babka wrote: >>>> Add a test that will create cache, allocate one object, kfree_rcu() it >>>> and attempt to destroy it. As long as the usage of kvfree_rcu_barrier() >>>> in kmem_cache_destroy() works correctly, there should be no warnings in >>>> dmesg and the test should pass. >>>> >>>> Additionally add a test_leak_destroy() test that leaks an object on >>>> purpose and verifies that kmem_cache_destroy() catches it. >>>> >>>> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> >>> >>> This test case, when run, triggers a warning traceback. >>> >>> kmem_cache_destroy TestSlub_kfree_rcu: Slab cache still has objects when called from test_leak_destroy+0x70/0x11c >>> WARNING: CPU: 0 PID: 715 at mm/slab_common.c:511 kmem_cache_destroy+0x1dc/0x1e4 >> >> Yes that should be suppressed like the other slub_kunit tests do. I have >> assumed it's not that urgent because for example the KASAN kunit tests all >> produce tons of warnings and thus assumed it's in some way acceptable for >> kunit tests to do. >> > > I have all tests which generate warning backtraces disabled. Trying to identify > which warnings are noise and which warnings are on purpose doesn't scale, > so it is all or nothing for me. I tried earlier to introduce a patch series > which would enable selective backtrace suppression, but that died the death > of architecture maintainers not caring and people demanding it to be perfect > (meaning it only addressed WARNING: backtraces and not BUG: backtraces, > and apparently that wasn't good enough). Ah, didn't know, too bad. > If the backtrace is intentional (and I think you are saying that it is), > I'll simply disable the test. That may be a bit counter-productive, but > there is really no alternative for me. It's intentional in the sense that the test intentionally triggers a condition that normally produces a warning. Many if the slub kunit test do that, but are able to suppress printing the warning when it happens in the kunit context. I forgot to do that for the new test initially as the warning there happens from a different path that those that already have the kunit suppression, but we'll implement that suppression there too ASAP. >>> That is, however, not the worst of it. It also causes boot stalls on >>> several platforms and architectures (various arm platforms, arm64, >>> loongarch, various ppc, and various x86_64). Reverting it fixes the >>> problem. Bisect results are attached for reference. >> >> OK, this part is unexpected. I assume you have the test built-in and not a >> module, otherwise it can't affect boot? And by stall you mean a delay or a > > Yes. > >> complete lockup? I've tried to reproduce that with virtme, but it seemed >> fine, maybe it's .config specific? > > It is a complete lockup. > >> >> I do wonder about the placement of the call of kunit_run_all_tests() from >> kernel_init_freeable() as that's before a bunch of initialization finishes. >> >> For example, system_state = SYSTEM_RUNNING; and rcu_end_inkernel_boot() only >> happens later in kernel_init(). I wouldn't be surprised if that means >> calling kfree_rcu() or rcu_barrier() or kvfree_rcu_barrier() as part of the >> slub tests is too early. >> >> Does the diff below fix the problem? Any advice from kunit folks? I could >> perhaps possibly make the slab test module-only instead of tristate or do >> some ifdef builtin on the problematic tests, but maybe it wouldn't be >> necessary with kunit_run_all_tests() happening a bit later. >> > > It does, at least based on my limited testing. However, given that the > backtrace is apparently intentional, it doesn't really matter - I'll disable > the test instead. Right, thanks. I will notify you when the suppression is done so the test can be re-enabled. But as the lockup should not be caused by the warning itself, we will still have to address it, as others configuring kunit tests built-in might see the lockup, or you would again see it as well, once the warning is addressed and test re-enabled. Your testing suggests moving the kunit_run_all_tests() call might be the right fix so let's see what kunit folks think. I would hope that no other kunit test would become broken by being executed later in boot, because when compiled as modules, it already happens even much later already. Thanks! Vlastimil > Thanks, > Guenter >