Re: selftests: cgroup: Failures – Timeouts & OOM Issues Analysis

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Naresh.

On Tue, Mar 04, 2025 at 05:26:45PM +0530, Naresh Kamboju <naresh.kamboju@xxxxxxxxxx> wrote:
> As part of LKFT’s re-validation of known issues, we have observed that
> the selftests: cgroup suite is consistently failing across almost all
> LKFT-supported devices due to:
>  - Test timeouts (45 seconds limit reached)
>  - OOM-killer invocation

Thanks for reporting the issues with the tests.

> ## Key Questions for Discussion:
>  - Would it be beneficial to increase the test timeout to ~180 seconds
>    to allow sufficient execution time?

That depends.

test_cpu has some lenghtier checks and they can in sum surpass 45s,
it'd might be better to shorten them (withing precision margin) instead
of prolonging the limit.

test_kmem -- it shouldn't take so long, if anything I'd suspect
/proc/kpagecgroup -- are your systems larger than 100GiB of memory
(that's my rough estimate for this reads to take above the limit)?

(Are there any other timeouts?)

OOM -- some tests are supposed to trigger memcg OOM.

>  - Should we enhance logging to explicitly print failure reasons when a
>    test fails?

These tests are useful when run by developers them_selves_. In such a
case it's handy to obtain more info running them understrace (since
they're so simple).

>  - Are there any missing dependencies that could be causing these failures?
>      Note: The required selftests/cgroup/config options were included in
>      LKFT's build and test plans.

The deps are rather minimal, only some coreutils (cgroup selftests
should be covered by e.g. this list [1]).



> 
> ## Devices Affected:
> The following DUTs consistently experience these failures:
>   -  dragonboard-410c (arm64)
>   -  dragonboard-845c (arm64)
>   -  e850-96 (arm64)
>   -  juno-r2 (arm64)
>   -  qemu-arm64 (arm64)
>   -  qemu-armv7
>   -  qemu-x86_64
>   -  rk3399-rock-pi-4b (arm64)
>   -  x15 (arm)
>   -  x86_64
> 
> Regression Analysis:
>  - New regression? No (these failures have been observed for months/years).

Actually, I noticed test_memcontrol failure yesterday (with ~mainline
kernel) but I remember they used to work also rather recently. I haven't
got time to look into that but at least that one may be a regression (in
code or test).

>  - Reproducibility? Yes, the failures occur consistently.

+/- as that may depend no nr_cpus or totalram.

>  - Test suite affected? selftests: cgroup (timeouts and OOM-related failures).


Michal

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux