Re: [LTP] Question about oom02 testcase

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/1/23 18:50, Li Wang wrote:
On Thu, Jun 1, 2023 at 5:46 PM Gou Hao <gouhao@xxxxxxxxxxxxx> wrote:

On 6/1/23 16:18, Li Wang wrote:

Hi Hao,

Thanks for reporting this, comments see below.

On Tue, May 30, 2023 at 9:26 AM Gou Hao <gouhao@xxxxxxxxxxxxx> wrote:

hello everyone,

Recently, kernel restarted while I was running oom02.
log:
```
[480156.950100] Tasks state (memory values in pages):
[480156.950101] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes
swapents oom_score_adj name
[480156.950302] [   2578]    81  2578      523        0 393216
6          -900 dbus-daemon
[480156.950309] [   2648]   172  2596     2435        0 393216
5             0 rtkit-daemon
[480156.950322] [   5256]     0  2826    25411        0 589824
0             0 DetectThread
[480156.950328] [   5404]     0  5404      412        2 393216
64         -1000 sshd
[480156.950357] [  10518]     0 10518     2586        0 393216
10             0 at-spi2-registr
[480156.950361] [  10553]     0 10551    10543        0 458752
9             0 QXcbEventQueue
[480156.950365] [  10867]     0 10567    17579        0 589824
16             0 QXcbEventQueue
[480156.950370] [  10928]     0 10921     6999        0 458752
17             0 QXcbEventQueue
[480156.950390] [  11882]     0 11811     7377        0 458752
10             0 QXcbEventQueue
[480156.950394] [  12052]     0 12052     5823        0 458752
21             0 fcitx
[480156.950404] [  12115]     0 12114    11678        0 524288
21             0 QXcbEventQueue
[480156.950408] [ 101558]     0 101558     3549        0 393216
0             0 runltp
[480156.950486] [1068864]     0 1068864      771        6 327680
85         -1000 systemd-udevd
[480156.950552] [1035639]     0 1035639       52        0 393216
14         -1000 oom02
[480156.950556] [1035640]     0 1035640       52        0 393216
23         -1000 oom02
[480156.950561] [1036065]     0 1036065      493       60 393216
0          -250 systemd-journal
[480156.950565] [1036087]     0 1036073  6258739  3543942
37814272        0             0 oom02
[480156.950572] Out of memory and no killable processes...
[480156.950575] Kernel panic - not syncing: System is deadlocked on memory
```

oom02-1036073 has been killed before crash.
log:
```
[480152.242506] [1035177]     0 1035177     4773       20 393216
115             0 sssd_nss
[480152.242510] [1035376]     0 1035376    25500      391 589824
602             0 tuned
[480152.242514] [1035639]     0 1035639       52        0 393216
14         -1000 oom02
[480152.242517] [1035640]     0 1035640       52        0 393216
19         -1000 oom02
[480152.242522] [1036065]     0 1036065      493      114 393216
62          -250 systemd-journal
[480152.242525] [1036073]     0 1036073  6258739  3540314 37814272
104             0 oom02
[480152.242529] Out of memory: Kill process 1036073 (oom02) score 755 or
sacrifice child
[480152.243869] Killed process 1036073 (oom02) total-vm:400559296kB,
anon-rss:226578368kB, file-rss:1728kB, shmem-rss:0kB
[480152.365804] oom_reaper: reaped process 1036073 (oom02), now
anon-rss:226594048kB, file-rss:0kB, shmem-rss:0kB
```
but its memory can not be reclaimed.I add trace-log to oom_reaper code
in kernel,
I found that there is a large range vma in the memory that cannot be
reclaimed, and the vma has the  `VM_LOCKED` flag, so cannot be reclaimed
immediately.
```log
        oom_reaper-57    [007] ....   126.063581: __oom_reap_task_mm: gh:
vma is anon:1048691, range=65536
        oom_reaper-57    [007] ....   126.063581: __oom_reap_task_mm: gh:
vma is anon:1048691, range=196608
        oom_reaper-57    [007] ....   126.063582: __oom_reap_task_mm: gh:
vma continue: 1056883, range:3221225472
        oom_reaper-57    [007] ....   126.063583: __oom_reap_task_mm: gh:
vma is anon:112, range=65536
        oom_reaper-57    [007] ....   126.063584: __oom_reap_task_mm: gh:
vma is anon:1048691, range=8388608
```
`vma continue: 1056883, range:3221225472` is the memory that can not
reclaims. 1057883(0x102073) is vma->vm_flags, it has VM_LOCKED` flag

oom02 created `nr_cpu` threads and used mmap to allocate memory. mmap
will merge continuous vma into one,
so as long as one thread is still running, the entire vma will not be
released.

In extreme cases, crashes may occur due to the lack of memory reclamation.

My question is that the crash in this case is a normal situation or a
bug (kernel or ltp bug) ?


The  ltp-oom test is originally designed to verify OOM mechanism
works for memory allocating in three types (normal, mlock, ksm)
all as expected.

Yes, your analysis is reasonable to some degree, oom_reaper
might not reclaim the VMA with locked pages  even after the
process termination.

But the exact behavior of the oom_reaper and the conditions under
which it can or cannot reclaim VMAs may vary depending on the
specific kernel version and configuration. So we shouldn't simply
regard the system panic as a Kernel or LTP defect.
And BTW, what is your tested kernel version?

hi Li Wang,

Thank you for your reply.

My kernel version is 4.19, but it's not a community version.

I have only encountered the crash once, and most of the time oom_reaper
can handle it well.

I tried to find a method or flag to prevent vma merging during mmap, but
couldn't find it.

That also might be related to the value of overcommit_memory,
if we set 2 (strict mode) to it, the oom_reaper can reclaim VM_LOCKED
memory more aggressively.

But in oom02 as you can see, it is set to 1 (always mode) for the
whole test, that might be the reason your system can't recover from
overcommit and finally crashed.

I do a oom02-test according to your suggestion: set overcommit_memory to 2,

most of the time, returning ENOMEM from the mmap() directly, the oom-kill is

only triggered approximately once, and the memory cannot reclaimed quickly by

oom-reaper.

```
Jun  2 10:24:51 ltptest kernel: [  71588]     0 71588 792      244   393216        0             0 sshd Jun  2 10:24:51 ltptest kernel: [  71590]     0 71590 792      150   393216        0             0 sshd Jun  2 10:24:51 ltptest kernel: [  71591]     0 71591 3565      109   393216        0             0 bash Jun  2 10:24:51 ltptest kernel: [  72118]     0 72118 3364       17   458752        0             0 sleep Jun  2 10:24:51 ltptest kernel: [  72134]     0 72134 3364       17   393216        0             0 tail Jun  2 10:24:51 ltptest kernel: [  72157]     0 72157 52       25   393216        0         -1000 oom02 Jun  2 10:24:51 ltptest kernel: [  72158]     0 72158 52       14   393216        0         -1000 oom02 Jun  2 10:24:51 ltptest kernel: [  72203]     0 72203   295609 244870  2359296        0             0 oom02 Jun  2 10:24:51 ltptest kernel: Out of memory: Kill process 72203 (oom02) score 373 or sacrifice child

Jun  2 10:24:51 ltptest kernel: Killed process 72203 (oom02) total-vm:18918976kB, anon-rss:15671680kB, file-rss:0kB, shmem-rss:0kB


Jun  2 10:24:51 ltptest kernel: oom_reaper: reaped process 72203 (oom02), now anon-rss:15681280kB, file-rss:0kB, shmem-rss:0kB

```

--
thanks,
Gou Hao <gouhao@xxxxxxxxxxxxx>





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux