Question about oom-killer

Gou Hao <gouhao@xxxxxxxxxxxxx> · Wed, 31 May 2023 16:42:43 +0800

hello everyone,

Recently, my kernel restarted while I was running ltp-oom02(It allocates 
memory infinitely in a loop, testing whether the oom-killer works 
properly ).
log:
```
[480156.950100] Tasks state (memory values in pages):
[480156.950101] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes 
swapents oom_score_adj name
[480156.950302] [   2578]    81  2578      523        0 393216        
6          -900 dbus-daemon
[480156.950309] [   2648]   172  2596     2435        0 393216        
5             0 rtkit-daemon
[480156.950322] [   5256]     0  2826    25411        0 589824        
0             0 DetectThread
[480156.950328] [   5404]     0  5404      412        2 393216       
64         -1000 sshd
[480156.950357] [  10518]     0 10518     2586        0 393216       
10             0 at-spi2-registr
[480156.950361] [  10553]     0 10551    10543        0 458752        
9             0 QXcbEventQueue
[480156.950365] [  10867]     0 10567    17579        0 589824       
16             0 QXcbEventQueue
[480156.950370] [  10928]     0 10921     6999        0 458752       
17             0 QXcbEventQueue
[480156.950390] [  11882]     0 11811     7377        0 458752       
10             0 QXcbEventQueue
[480156.950394] [  12052]     0 12052     5823        0 458752       
21             0 fcitx
[480156.950404] [  12115]     0 12114    11678        0 524288       
21             0 QXcbEventQueue
[480156.950408] [ 101558]     0 101558     3549        0 393216        
0             0 runltp
[480156.950486] [1068864]     0 1068864      771        6 327680       
85         -1000 systemd-udevd
[480156.950552] [1035639]     0 1035639       52        0 393216       
14         -1000 oom02
[480156.950556] [1035640]     0 1035640       52        0 393216       
23         -1000 oom02
[480156.950561] [1036065]     0 1036065      493       60 393216        
0          -250 systemd-journal
[480156.950565] [1036087]     0 1036073  6258739  3543942 
37814272        0             0 oom02
[480156.950572] Out of memory and no killable processes...
[480156.950575] Kernel panic - not syncing: System is deadlocked on memory
```

oom02-1036073 has been already killed before crash.
log:
```
[480152.242506] [1035177]     0 1035177     4773       20 393216      
115             0 sssd_nss
[480152.242510] [1035376]     0 1035376    25500      391 589824      
602             0 tuned
[480152.242514] [1035639]     0 1035639       52        0 393216       
14         -1000 oom02
[480152.242517] [1035640]     0 1035640       52        0 393216       
19         -1000 oom02
[480152.242522] [1036065]     0 1036065      493      114 393216       
62          -250 systemd-journal
[480152.242525] [1036073]     0 1036073  6258739  3540314 37814272      
104             0 oom02
[480152.242529] Out of memory: Kill process 1036073 (oom02) score 755 or 
sacrifice child
[480152.243869] Killed process 1036073 (oom02) total-vm:400559296kB, 
anon-rss:226578368kB, file-rss:1728kB, shmem-rss:0kB
[480152.365804] oom_reaper: reaped process 1036073 (oom02), now 
anon-rss:226594048kB, file-rss:0kB, shmem-rss:0kB
```
but its memory can not be reclaimed.I add trace-log to oom_reaper code 
in kernel,
I found that there is a large range vma in the memory that cannot be 
reclaimed, and the vma has the  `VM_LOCKED` flag, so cannot be reclaimed 
immediately.
```log
      oom_reaper-57    [007] ....   126.063581: __oom_reap_task_mm: gh: 
vma is anon:1048691, range=65536
      oom_reaper-57    [007] ....   126.063581: __oom_reap_task_mm: gh: 
vma is anon:1048691, range=196608
      oom_reaper-57    [007] ....   126.063582: __oom_reap_task_mm: gh: 
vma continue: 1056883, range:3221225472
      oom_reaper-57    [007] ....   126.063583: __oom_reap_task_mm: gh: 
vma is anon:112, range=65536
      oom_reaper-57    [007] ....   126.063584: __oom_reap_task_mm: gh: 
vma is anon:1048691, range=8388608
```
`vma continue: 1056883, range:3221225472` is the memory that can not 
reclaims. 1057883(0x102073) is vma->vm_flags, it has VM_LOCKED` flag

oom02 created `nr_cpu` threads and used mmap to allocate memory. mmap 
will merge continuous vma into one,
so as long as one thread is still running, the entire vma will not be 
released.

In extreme cases, crashes may occur due to the lack of memory reclamation.

I'm not sure if this is a kernel's bug ?

--
thanks,
Gou Hao <gouhao@xxxxxxxxxxxxx>