hello everyone,
Recently, my kernel restarted while I was running ltp-oom02(It allocates
memory infinitely in a loop, testing whether the oom-killer works
properly ).
log:
```
[480156.950100] Tasks state (memory values in pages):
[480156.950101] [ pid ] uid tgid total_vm rss pgtables_bytes
swapents oom_score_adj name
[480156.950302] [ 2578] 81 2578 523 0 393216
6 -900 dbus-daemon
[480156.950309] [ 2648] 172 2596 2435 0 393216
5 0 rtkit-daemon
[480156.950322] [ 5256] 0 2826 25411 0 589824
0 0 DetectThread
[480156.950328] [ 5404] 0 5404 412 2 393216
64 -1000 sshd
[480156.950357] [ 10518] 0 10518 2586 0 393216
10 0 at-spi2-registr
[480156.950361] [ 10553] 0 10551 10543 0 458752
9 0 QXcbEventQueue
[480156.950365] [ 10867] 0 10567 17579 0 589824
16 0 QXcbEventQueue
[480156.950370] [ 10928] 0 10921 6999 0 458752
17 0 QXcbEventQueue
[480156.950390] [ 11882] 0 11811 7377 0 458752
10 0 QXcbEventQueue
[480156.950394] [ 12052] 0 12052 5823 0 458752
21 0 fcitx
[480156.950404] [ 12115] 0 12114 11678 0 524288
21 0 QXcbEventQueue
[480156.950408] [ 101558] 0 101558 3549 0 393216
0 0 runltp
[480156.950486] [1068864] 0 1068864 771 6 327680
85 -1000 systemd-udevd
[480156.950552] [1035639] 0 1035639 52 0 393216
14 -1000 oom02
[480156.950556] [1035640] 0 1035640 52 0 393216
23 -1000 oom02
[480156.950561] [1036065] 0 1036065 493 60 393216
0 -250 systemd-journal
[480156.950565] [1036087] 0 1036073 6258739 3543942
37814272 0 0 oom02
[480156.950572] Out of memory and no killable processes...
[480156.950575] Kernel panic - not syncing: System is deadlocked on memory
```
oom02-1036073 has been already killed before crash.
log:
```
[480152.242506] [1035177] 0 1035177 4773 20 393216
115 0 sssd_nss
[480152.242510] [1035376] 0 1035376 25500 391 589824
602 0 tuned
[480152.242514] [1035639] 0 1035639 52 0 393216
14 -1000 oom02
[480152.242517] [1035640] 0 1035640 52 0 393216
19 -1000 oom02
[480152.242522] [1036065] 0 1036065 493 114 393216
62 -250 systemd-journal
[480152.242525] [1036073] 0 1036073 6258739 3540314 37814272
104 0 oom02
[480152.242529] Out of memory: Kill process 1036073 (oom02) score 755 or
sacrifice child
[480152.243869] Killed process 1036073 (oom02) total-vm:400559296kB,
anon-rss:226578368kB, file-rss:1728kB, shmem-rss:0kB
[480152.365804] oom_reaper: reaped process 1036073 (oom02), now
anon-rss:226594048kB, file-rss:0kB, shmem-rss:0kB
```
but its memory can not be reclaimed.I add trace-log to oom_reaper code
in kernel,
I found that there is a large range vma in the memory that cannot be
reclaimed, and the vma has the `VM_LOCKED` flag, so cannot be reclaimed
immediately.
```log
oom_reaper-57 [007] .... 126.063581: __oom_reap_task_mm: gh:
vma is anon:1048691, range=65536
oom_reaper-57 [007] .... 126.063581: __oom_reap_task_mm: gh:
vma is anon:1048691, range=196608
oom_reaper-57 [007] .... 126.063582: __oom_reap_task_mm: gh:
vma continue: 1056883, range:3221225472
oom_reaper-57 [007] .... 126.063583: __oom_reap_task_mm: gh:
vma is anon:112, range=65536
oom_reaper-57 [007] .... 126.063584: __oom_reap_task_mm: gh:
vma is anon:1048691, range=8388608
```
`vma continue: 1056883, range:3221225472` is the memory that can not
reclaims. 1057883(0x102073) is vma->vm_flags, it has VM_LOCKED` flag
oom02 created `nr_cpu` threads and used mmap to allocate memory. mmap
will merge continuous vma into one,
so as long as one thread is still running, the entire vma will not be
released.
In extreme cases, crashes may occur due to the lack of memory reclamation.
I'm not sure if this is a kernel's bug ?
--
thanks,
Gou Hao <gouhao@xxxxxxxxxxxxx>