Re: [LTP] Question about oom02 testcase

Gou Hao <gouhao@xxxxxxxxxxxxx> · Thu, 1 Jun 2023 17:45:50 +0800



    On 6/1/23 16:18, Li Wang wrote:

    
          Hi Hao,
          

          Thanks for
            reporting this, comments see below.
        
        
          On Tue, May 30, 2023 at
            9:26 AM Gou Hao <gouhao@xxxxxxxxxxxxx>
            wrote:

          
          hello everyone,

            
            Recently, kernel restarted while I was running oom02.

            log:

            ```

            [480156.950100] Tasks state (memory values in pages):

            [480156.950101] [  pid  ]   uid  tgid total_vm      rss
            pgtables_bytes 

            swapents oom_score_adj name

            [480156.950302] [   2578]    81  2578      523        0
            393216        

            6          -900 dbus-daemon

            [480156.950309] [   2648]   172  2596     2435        0
            393216        

            5             0 rtkit-daemon

            [480156.950322] [   5256]     0  2826    25411        0
            589824        

            0             0 DetectThread

            [480156.950328] [   5404]     0  5404      412        2
            393216       

            64         -1000 sshd

            [480156.950357] [  10518]     0 10518     2586        0
            393216       

            10             0 at-spi2-registr

            [480156.950361] [  10553]     0 10551    10543        0
            458752        

            9             0 QXcbEventQueue

            [480156.950365] [  10867]     0 10567    17579        0
            589824       

            16             0 QXcbEventQueue

            [480156.950370] [  10928]     0 10921     6999        0
            458752       

            17             0 QXcbEventQueue

            [480156.950390] [  11882]     0 11811     7377        0
            458752       

            10             0 QXcbEventQueue

            [480156.950394] [  12052]     0 12052     5823        0
            458752       

            21             0 fcitx

            [480156.950404] [  12115]     0 12114    11678        0
            524288       

            21             0 QXcbEventQueue

            [480156.950408] [ 101558]     0 101558     3549        0
            393216        

            0             0 runltp

            [480156.950486] [1068864]     0 1068864      771        6
            327680       

            85         -1000 systemd-udevd

            [480156.950552] [1035639]     0 1035639       52        0
            393216       

            14         -1000 oom02

            [480156.950556] [1035640]     0 1035640       52        0
            393216       

            23         -1000 oom02

            [480156.950561] [1036065]     0 1036065      493       60
            393216        

            0          -250 systemd-journal

            [480156.950565] [1036087]     0 1036073  6258739  3543942 

            37814272        0             0 oom02

            [480156.950572] Out of memory and no killable processes...

            [480156.950575] Kernel panic - not syncing: System is
            deadlocked on memory

            ```

            
            oom02-1036073 has been killed before crash.

            log:

            ```

            [480152.242506] [1035177]     0 1035177     4773       20
            393216      

            115             0 sssd_nss

            [480152.242510] [1035376]     0 1035376    25500      391
            589824      

            602             0 tuned

            [480152.242514] [1035639]     0 1035639       52        0
            393216       

            14         -1000 oom02

            [480152.242517] [1035640]     0 1035640       52        0
            393216       

            19         -1000 oom02

            [480152.242522] [1036065]     0 1036065      493      114
            393216       

            62          -250 systemd-journal

            [480152.242525] [1036073]     0 1036073  6258739  3540314
            37814272      

            104             0 oom02

            [480152.242529] Out of memory: Kill process 1036073 (oom02)
            score 755 or 

            sacrifice child

            [480152.243869] Killed process 1036073 (oom02)
            total-vm:400559296kB, 

            anon-rss:226578368kB, file-rss:1728kB, shmem-rss:0kB

            [480152.365804] oom_reaper: reaped process 1036073 (oom02),
            now 

            anon-rss:226594048kB, file-rss:0kB, shmem-rss:0kB

            ```

            but its memory can not be reclaimed.I add trace-log to
            oom_reaper code 

            in kernel,

            I found that there is a large range vma in the memory that
            cannot be 

            reclaimed, and the vma has the  `VM_LOCKED` flag, so cannot
            be reclaimed 

            immediately.

            ```log

                   oom_reaper-57    [007] ....   126.063581:
            __oom_reap_task_mm: gh: 

            vma is anon:1048691, range=65536

                   oom_reaper-57    [007] ....   126.063581:
            __oom_reap_task_mm: gh: 

            vma is anon:1048691, range=196608

                   oom_reaper-57    [007] ....   126.063582:
            __oom_reap_task_mm: gh: 

            vma continue: 1056883, range:3221225472

                   oom_reaper-57    [007] ....   126.063583:
            __oom_reap_task_mm: gh: 

            vma is anon:112, range=65536

                   oom_reaper-57    [007] ....   126.063584:
            __oom_reap_task_mm: gh: 

            vma is anon:1048691, range=8388608

            ```

            `vma continue: 1056883, range:3221225472` is the memory that
            can not 

            reclaims. 1057883(0x102073) is vma->vm_flags, it has
            VM_LOCKED` flag

            
            oom02 created `nr_cpu` threads and used mmap to allocate
            memory. mmap 

            will merge continuous vma into one,

            so as long as one thread is still running, the entire vma
            will not be 

            released.

            
            In extreme cases, crashes may occur due to the lack of
            memory reclamation.

            
            My question is that the crash in this case is a normal
            situation or a 

            bug (kernel or ltp bug) ?

          
          The 
            ltp-oom test is originally designed to verify OOM mechanism
          works for
            memory allocating in three types (normal, mlock, ksm)
          all as
            expected.
          

          Yes, your analysis is reasonable to some degree, oom_reaper

          might not reclaim the VMA with locked pages  even
          after the
        process termination.
        

        But the
        exact behavior of the oom_reaper and the conditions under
        which it can or cannot reclaim VMAs may vary depending on
          the
        specific kernel version and configuration. So we
            shouldn't simply
        regard the
          system panic as a Kernel or LTP defect.
        
        
          And BTW,
            what is your tested kernel version?
          

    hi Li Wang, 

    
    Thank you for your reply.
    My kernel version is 4.19, but it's not a community version. 

    
    I have only encountered the crash once, and most of the time
      oom_reaper can handle it well.
    I tried to find a method or flag to prevent vma merging during
      mmap, but couldn't find it.

    
          -- 

          
              Regards,

              
              Li Wang

              
    -- 
thanks,
Gou Hao <gouhao@xxxxxxxxxxxxx>