Hello Naresh. On Tue, Mar 04, 2025 at 05:26:45PM +0530, Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> wrote: > As part of LKFT’s re-validation of known issues, we have observed that > the selftests: cgroup suite is consistently failing across almost all > LKFT-supported devices due to: > - Test timeouts (45 seconds limit reached) > - OOM-killer invocation Thanks for reporting the issues with the tests. > ## Key Questions for Discussion: > - Would it be beneficial to increase the test timeout to ~180 seconds > to allow sufficient execution time? That depends. test_cpu has some lenghtier checks and they can in sum surpass 45s, it'd might be better to shorten them (withing precision margin) instead of prolonging the limit. test_kmem -- it shouldn't take so long, if anything I'd suspect /proc/kpagecgroup -- are your systems larger than 100GiB of memory (that's my rough estimate for this reads to take above the limit)? (Are there any other timeouts?) OOM -- some tests are supposed to trigger memcg OOM. > - Should we enhance logging to explicitly print failure reasons when a > test fails? These tests are useful when run by developers them_selves_. In such a case it's handy to obtain more info running them understrace (since they're so simple). > - Are there any missing dependencies that could be causing these failures? > Note: The required selftests/cgroup/config options were included in > LKFT's build and test plans. The deps are rather minimal, only some coreutils (cgroup selftests should be covered by e.g. this list [1]). > > ## Devices Affected: > The following DUTs consistently experience these failures: > - dragonboard-410c (arm64) > - dragonboard-845c (arm64) > - e850-96 (arm64) > - juno-r2 (arm64) > - qemu-arm64 (arm64) > - qemu-armv7 > - qemu-x86_64 > - rk3399-rock-pi-4b (arm64) > - x15 (arm) > - x86_64 > > Regression Analysis: > - New regression? No (these failures have been observed for months/years). Actually, I noticed test_memcontrol failure yesterday (with ~mainline kernel) but I remember they used to work also rather recently. I haven't got time to look into that but at least that one may be a regression (in code or test). > - Reproducibility? Yes, the failures occur consistently. +/- as that may depend no nr_cpus or totalram. > - Test suite affected? selftests: cgroup (timeouts and OOM-related failures). Michal
Attachment:
signature.asc
Description: PGP signature