Re: cgroup OOM killer loop causes system to lockup (possible fix included)

"Cal Leeming [Simplicity Media Ltd]" <cal.leeming@xxxxxxxxxxxxxxxxxxxxxxxx> · Tue, 31 May 2011 20:29:11 +0100



    This is now being handed back to linux-mm for re-assessment..
    OpenSSH devs are saying this is not a fault in their code..

    
    It would appear that the loadable bnx2 module is causing strange
    oom_adj behavior.. and if its affecting this, I wonder what else it
    might be affecting? Bug can only be reproduced when the module is in
    use by actual hardware.

    
    Here is the latest conversation with openssh devs, which confirms
    this definitely falls within the remit of debian or kernel-mm.
    

    On 31/05/2011 13:25, Gert Doering wrote:
    

    > Hi,
    

    >
    

    > On Tue, May 31, 2011 at 12:11:13PM +0100, Cal Leeming
    [Simplicity Media Ltd] wrote:
    

    >> Could you point out the line of code where oom_adj_save is
    set to the
    

    >> original value, because I've looked everywhere, and from
    what I can
    

    >> tell, it's only ever set to INT_MIN, and no where else is
    it changed.
    

    >> (C is not my strongest language tho, so I most likely have
    overlooked
    

    >> something). This is where I got thrown off.
    

    >
    

    > oom_adjust_setup() does this:
    

    >
    

    >                 if ((fp = fopen(oom_adj_path, "r+")) != NULL) {
    

    >                         if (fscanf(fp, "%d", &oom_adj_save)
    != 1)
    

    >                                 verbose("error reading %s: %s",
    oom_adj_path,
    

    >                                     strerror(errno));
    

    >
    

    > the "fscanf()" call will read an integer ("%d") from the file
    named,
    

    > and write that number into the variable being pointed to
    (&oom_adj_save).
    

    >
    

    > The loop is a bit tricky to read as it takes different paths
    into
    

    > account, and will exit after the first successful update.
    

    >
    

    > fscanf() will return the number of successful conversions, so
    if it
    

    > was able to read "one number", the return value is "1", and it
    will
    

    > jump to the else block
    

    >
    

    >                         else {
    

    >                                 rewind(fp);
    

    >                                 if (fprintf(fp, "%d\n", value)
    <= 0)
    

    >                                         verbose("error writing
    %s: %s",
    

    >                                            oom_adj_path,
    strerror(errno));
    

    >                                 else
    

    >                                         verbose("Set %s from %d
    to %d",
    

    >                                            oom_adj_path,
    oom_adj_save, value);
    

    >                         }
    

    >
    

    > where it will overwrite what is in that file with the new value
    

    > ("value"), and then print the "Set ... from -17 to -17" message
    that
    

    > you saw.
    

    Ah, thank you for explaining this. Makes a lot more sense now :)
    

    >
    

    >
    

    >>> The question here is why sshd is sometimes started with
    -17 and sometimes
    

    >>> with 0 - that's the bug, not that sshd keeps what it's
    given.
    

    >>>
    

    >>> (Ask yourself: if sshd had no idea about oom_adj at
    all, would this make
    

    >>> it buggy by not changing the value?)
    

    >>
    

    >> This was what I was trying to pinpoint down before. I had
    came to this
    

    >> conclusion myself, sent it to the Debian bug list, and they
    dismissed
    

    >> on the grounds that it was an openssh problem...
    

    >
    

    > I must admit that I have no idea what is causing it, but from
    the logs,
    

    > it very much looks like sshd is started with "-17" in there -
    but only
    

    > in the problem case.
    

    >
    

    >
    

    >> So far, the buck has been passed from kernel-mm to
    debian-kernel, to
    

    >> openssh, and now back to debian-kernel lol. The most
    annoying thing,
    

    >> is that you can't get this bug to happen unless you
    physically test on
    

    >> a machine which requires the bnx2 firmwire, so I get the
    feeling this
    

    >> won't get resolved :/
    

    >
    

    > And *that* strongly points to a bug in the
    bnx2 stuff - if other programs
    

    > change their behaviour based on the existance of a given
    driver, that
    

    > does not smell very healthy.
    

    Agreed.. I was thinking of adding some debug into the fs/proc/ code
    which does a kprint on every oom_adj read/write, but I couldn't
    figure out how to extract the pid from the task (pointer?).
    

    >
    

    > [..]
    

    >>> Anyway, as a workaround for your system, you can
    certainly set
    

    >>>
    

    >>>  oom_adj_save = 0;
    

    >>>
    

    >>> in the beginning of port-linux.c /
    oom_adjust_restore(), to claim that
    

    >>> "hey, this was the saved value to start with" and
    "restore" oom_adj to 0
    

    >>> then - but that's just hiding the bug, not fixing it.
    

    >>
    

    >> I'm disappointed this wasn't the correct fix, I honestly
    thought I had
    

    >> patched it right :(
    

    >
    

    > Well, that's the short hand - "just ignore everything that
    happened at
    

    > init / save time, and forcibly write back '0', no matter what
    was there
    

    > before".
    

    >
    

    >> But, on the other hand, ssh users should really never have
    a default
    

    >> oom_adj of -17, so maybe 0 should be set as default anyway?
    If this is
    

    >> not the case, could you give reasons why??
    

    >
    

    > Well, I would say "the default value in there is a matter of
    local policy",
    

    > so what if someone wants to make sure that whatever is run from
    sshd is
    

    > always privileged regarding oom, even if a local firefox etc.
    is running
    

    > amock and you need to ssh-in and kill the GUI stuff...
    

    >
    

    > One might opt to run sshd (and all its children) at "-5"
    (slightly special
    

    > treatment), or "0" (no special treatment), or even at "-17" -
    but that's
    

    > local policy.
    

    Ah, okay that's make sense.
    

    >
    

    >
    

    > Mmmh.
    

    >
    

    > Since this seems to be inherited, it might even work if you
    just change
    

    > the sshd startup script, and insert
    

    >
    

    >   echo 0 >/proc/self/oom_adj
    

    >
    

    > in there, right before it starts the sshd...  "local policy at
    work".
    

    Yeah I was going to do this, but then I thought "well if this
    problem is occurring for openssh, then what else could it be
    affecting?". As you pointed out above, having the oom_adj changed
    based on the existence of a driver is really not good.
    

    I will paste this convo trail into the debian ticket, and hopefully
    it'll help convince them this issue needs fixing.
    

    >
    

    > gert
    

    Thanks again for taking the time to reply!
    

    On 30/05/2011 22:36, Cal Leeming [Simplicity Media Ltd] wrote:
    FYI everyone, I found a bug within openssh-server
      which caused this problem.
      

      I've patched and submitted to the openssh list.
      

      You can find details of this by googling for:
      

      "port-linux.c bug with oom_adjust_restore() - causes real bad
      oom_adj - which can cause DoS conditions"
      

      It's extremely strange.. :S
      

      Cal
      

      On 30/05/2011 18:36, Cal Leeming [Simplicity Media Ltd] wrote:
      

      Here is an strace of the SSH process
        (which is somehow inheriting the -17 oom_adj on all forked user
        instances)
        

        (broken server - with bnx2 module loaded)
        

        [pid  2200] [    7f13a09c9cb0] open("/proc/self/oom_adj",
        O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
        

        [pid  2120] [    7f13a09c9f00] write(7, "\0\0\2\240\n\n\n\nPort
        22\n\n\n\nProtocol 2\n\nH"..., 680 <unfinished ...>
        

        [pid  2200] [    7f13a09c9cb0] <... open resumed> ) = 9
        

        [pid  2120] [    7f13a09c9f00] <... write resumed> ) = 680
        

        [pid  2120] [    7f13a09c9e40] close(7 <unfinished ...>
        

        [pid  2200] [    7f13a09c9844] fstat(9, <unfinished ...>
        

        [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
        

        [pid  2200] [    7f13a09c9844] <... fstat resumed>
        {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
        

        [pid  2120] [    7f13a09c9e40] close(8 <unfinished ...>
        

        [pid  2200] [    7f13a09d2a2a] mmap(NULL, 4096,
        PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0
        <unfinished ...>
        

        [pid  2120] [    7f13a09c9e40] <... close resumed> ) = 0
        

        [pid  2200] [    7f13a09d2a2a] <... mmap resumed> ) =
        0x7f13a25a6000
        

        [pid  2120] [    7f13a09c9e40] close(4 <unfinished ...>
        

        [pid  2200] [    7f13a09c9f00] write(9, "-17\n", 4
        <unfinished ...>
        

        (working server - with bnx2 module unloaded)
        

        [pid  1323] [    7fae577fbe40] close(7) = 0
        

        [pid  1631] [    7fae577fbcb0] open("/proc/self/oom_adj",
        O_WRONLY|O_CREAT|O_TRUNC, 0666 <unfinished ...>
        

        [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\217\0", 5
        <unfinished ...>
        

        [pid  1631] [    7fae577fbcb0] <... open resumed> ) = 10
        

        [pid  1323] [    7fae577fbf00] <... write resumed> ) = 5
        

        [pid  1323] [    7fae577fbf00] write(8, "\0\0\2\206\n\n\n\nPort
        22\n\n\n\nProtocol 2\n\nH"..., 654 <unfinished ...>
        

        [pid  1631] [    7fae577fb844] fstat(10, <unfinished ...>
        

        [pid  1323] [    7fae577fbf00] <... write resumed> ) = 654
        

        [pid  1631] [    7fae577fb844] <... fstat resumed>
        {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
        

        [pid  1323] [    7fae577fbe40] close(8) = 0
        

        [pid  1631] [    7fae57804a2a] mmap(NULL, 4096,
        PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0
        <unfinished ...>
        

        [pid  1323] [    7fae577fbe40] close(9 <unfinished ...>
        

        [pid  1631] [    7fae57804a2a] <... mmap resumed> ) =
        0x7fae593d9000
        

        [pid  1323] [    7fae577fbe40] <... close resumed> ) = 0
        

        [pid  1323] [    7fae577fbe40] close(5 <unfinished ...>
        

        [pid  1631] [    7fae577fbf00] write(10, "0\n", 2 <unfinished
        ...>
        

        The two servers are *EXACT* duplicates of each other, completely
        fresh Debian installs, with exactly the same packages installed.
        

        As you can see, the working server sends "0" into the oom_adj
        and the broken one sends "-17".
        

        On 30/05/2011 15:27, Cal Leeming [Simplicity Media Ltd] wrote:
        

        I FOUND THE PROBLEM!!!
          

          Explicit details can be found on the Debian kernel mailing
          list, but to cut short, it's caused by the firmware-bnx2
          kernel module:
          

          The broken server uses 'firmware-bnx2'.. so I purged the bnx2
          package, removed the bnx*.ko files from /lib/modules, ran
          update-initramfs, and then rebooted (i then confirmed it was
          removed by checking ifconfig and lsmod).
          

          And guess what.. IT WORKED.
          

          So, this problem seems to be caused by the firmware-bnx2
          module being loaded.. some how, that module is causing -17
          oom_adj to be set for everything..
          

          WTF?!?! Surely a bug?? Could someone please forward this to
          the appropriate person for the bnx2 kernel module, as I
          wouldn't even know where to begin :S
          

          Cal
          

          On 30/05/2011 11:52, Cal Leeming [Simplicity Media Ltd] wrote:
          

          -resent due to incorrect formatting,
            sorry if this dups!
            

            @Kame
            

            Thanks for the reply!
            

            Both kernels used the same env/dist, but which slightly
            different packages.
            

            After many frustrating hours, I have pin pointed this down
            to a dodgy
            

            Debian package which appears to continue affecting the
            system, even
            

            after purging. I'm still yet to pin point the package down
            (I'm doing
            

            several reinstall tests, along with tripwire analysis after
            each
            

            reboot).
            

            @Hiroyuki
            

            Thank you for sending this to the right people!
            

            @linux-mm
            

            On a side note, would someone mind taking a few minutes to
            give a
            

            brief explanation as to how the default oom_adj is set, and
            under what
            

            conditions it is given -17 by default? Is this defined by
            the
            

            application? I looked through the kernel source,
            

            and noticed some of the code was defaulted to set oom_adj to
            

            OOM_DISABLE (which is defined in the headers as -17).
            

            Assuming the debian problem is resolved, this might be
            another call
            

            for the oom-killer to be modified so that if it encounters
            the
            

            unrecoverable loop, it ignores the -17 rule (with some
            exceptions,
            

            such as kernel processes, and other critical things). If
            this is going
            

            to be a relatively simple task, I wouldn't mind spending a
            few hours
            

            patching this?
            

            Cal
            

            On Mon, May 30, 2011 at 3:23 AM, KAMEZAWA Hiroyuki
            

            <kamezawa.hiroyu@xxxxxxxxxxxxxx>  wrote:
            

            Thank you. memory cgroup and OOM
              troubles are handled in linux-mm.
              

              On Sun, 29 May 2011 23:24:07 +0100
              

              "Cal Leeming [Simplicity Media
              Ltd]"<cal.leeming@xxxxxxxxxxxxxxxxxxxxxxxx>  wrote:
              

              Some further logs:
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.369927] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.369939]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.399285] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.399296]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.428690] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.428702]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.487696] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.487708]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.517023] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.517035]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.546379] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:38 vicky kernel: [
                2283.546391]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.310789] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.310804]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.369918] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.369930]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.399284] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.399296]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.433634] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.433648]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.463947] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.463959]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.493439] redis-server
                

                invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-17
                

                ./log/syslog:May 30 07:44:43 vicky kernel: [
                2288.493451]
                

                [<ffffffff810b12b7>] ? oom_kill_process+0x82/0x283
                

              hmm, in short, applications has -17 oom_adj in default
              with 2.6.32.41 ?
              

              AFAIK, no kernel has such crazy settings as default..
              

              Does your 2 kernel uses the same environment/distribution
              ?
              

              Thanks,
              

              -Kame
              

              On 29/05/2011 22:50, Cal Leeming
                [Simplicity Media Ltd] wrote:
                

                  First of all, my apologies if
                  I have submitted this problem to the
                  

                  wrong place, spent 20 minutes trying to figure out
                  where it needs to
                  

                  be sent, and was still none the wiser.
                  

                  The problem is related to applying memory limitations
                  within a cgroup.
                  

                  If the OOM killer kicks in, it gets stuck in a loop
                  where it tries to
                  

                  kill a process which has an oom_adj of -17. This
                  causes an infinite
                  

                  loop, which in turn locks up the system.
                  

                  May 30 03:13:08 vicky kernel: [ 1578.117055] Memory
                  cgroup out of
                  

                  memory: kill process 6016 (java) score 0 or a child
                  

                  May 30 03:13:08 vicky kernel: [ 1578.117154] Memory
                  cgroup out of
                  

                  memory: kill process 6016 (java) score 0 or a child
                  

                  May 30 03:13:08 vicky kernel: [ 1578.117248] Memory
                  cgroup out of
                  

                  memory: kill process 6016 (java) score 0 or a child
                  

                  May 30 03:13:08 vicky kernel: [ 1578.117343] Memory
                  cgroup out of
                  

                  memory: kill process 6016 (java) score 0 or a child
                  

                  May 30 03:13:08 vicky kernel: [ 1578.117441] Memory
                  cgroup out of
                  

                  memory: kill process 6016 (java) score 0 or a child
                  

                    root@vicky [/home/foxx]>  uname -a
                  

                  Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43
                  BST 2011 x86_64
                  

                  GNU/Linux
                  

                  (this happens on both the grsec patched and non
                  patched 2.6.32.41 kernel)
                  

                  When this is encountered, the memory usage across the
                  whole server is
                  

                  still within limits (not even hitting swap).
                  

                  The memory configuration for the cgroup/lxc is:
                  

                  lxc.cgroup.memory.limit_in_bytes = 3000M
                  

                  lxc.cgroup.memory.memsw.limit_in_bytes = 3128M
                  

                  Now, what is even more strange, is that when running
                  under the
                  

                  2.6.32.28 kernel (both patched and unpatched), this
                  problem doesn't
                  

                  happen. However, there is a slight difference between
                  the two kernels.
                  

                  The 2.6.32.28 kernel gives a default of 0 in the
                  /proc/X/oom_adj,
                  

                  where as the 2.6.32.41 gives a default of -17. I
                  suspect this is the
                  

                  root cause of why it's showing in the later kernel,
                  but not the earlier.
                  

                  To test this theory, I started up the lxc on both
                  servers, and then
                  

                  ran a one liner which showed me all the processes with
                  an oom_adj of -17:
                  

                  (the below is the older/working kernel)
                  

                  root@xxxxxxxxxxxxxxxxx [/mnt/encstore/lxc]>  uname
                  -a
                  

                  Linux courtney.internal 2.6.32.28-grsec #3 SMP Fri Feb
                  18 16:09:07 GMT
                  

                  2011 x86_64 GNU/Linux
                  

                  root@xxxxxxxxxxxxxxxxx [/mnt/encstore/lxc]>  for x
                  in `find /proc
                  

                  -iname 'oom_adj' | xargs grep "\-17"  | awk -F '/'
                  '{print $3}'` ; do
                  

                  ps -p $x --no-headers ; done
                  

                  grep: /proc/1411/task/1411/oom_adj: No such file or
                  directory
                  

                  grep: /proc/1411/oom_adj: No such file or directory
                  

                     804 ?        00:00:00 udevd
                  

                     804 ?        00:00:00 udevd
                  

                  25536 ?        00:00:00 sshd
                  

                  25536 ?        00:00:00 sshd
                  

                  31861 ?        00:00:00 sshd
                  

                  31861 ?        00:00:00 sshd
                  

                  32173 ?        00:00:00 udevd
                  

                  32173 ?        00:00:00 udevd
                  

                  32174 ?        00:00:00 udevd
                  

                  32174 ?        00:00:00 udevd
                  

                  (the below is the newer/broken kernel)
                  

                    root@vicky
                  [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  uname
                  -a
                  

                  Linux vicky 2.6.32.41-grsec #3 SMP Mon May 30 02:34:43
                  BST 2011 x86_64
                  

                  GNU/Linux
                  

                    root@vicky
                  [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  for x
                  in
                  

                  `find /proc -iname 'oom_adj' | xargs grep "\-17"  |
                  awk -F '/' '{print
                  

                  $3}'` ; do ps -p $x --no-headers ; done
                  

                  grep: /proc/3118/task/3118/oom_adj: No such file or
                  directory
                  

                  grep: /proc/3118/oom_adj: No such file or directory
                  

                     895 ?        00:00:00 udevd
                  

                     895 ?        00:00:00 udevd
                  

                    1091 ?        00:00:00 udevd
                  

                    1091 ?        00:00:00 udevd
                  

                    1092 ?        00:00:00 udevd
                  

                    1092 ?        00:00:00 udevd
                  

                    2596 ?        00:00:00 sshd
                  

                    2596 ?        00:00:00 sshd
                  

                    2608 ?        00:00:00 sshd
                  

                    2608 ?        00:00:00 sshd
                  

                    2613 ?        00:00:00 sshd
                  

                    2613 ?        00:00:00 sshd
                  

                    2614 pts/0    00:00:00 bash
                  

                    2614 pts/0    00:00:00 bash
                  

                    2620 pts/0    00:00:00 sudo
                  

                    2620 pts/0    00:00:00 sudo
                  

                    2621 pts/0    00:00:00 su
                  

                    2621 pts/0    00:00:00 su
                  

                    2622 pts/0    00:00:00 bash
                  

                    2622 pts/0    00:00:00 bash
                  

                    2685 ?        00:00:00 lxc-start
                  

                    2685 ?        00:00:00 lxc-start
                  

                    2699 ?        00:00:00 init
                  

                    2699 ?        00:00:00 init
                  

                    2939 ?        00:00:00 rc
                  

                    2939 ?        00:00:00 rc
                  

                    2942 ?        00:00:00 startpar
                  

                    2942 ?        00:00:00 startpar
                  

                    2964 ?        00:00:00 rsyslogd
                  

                    2964 ?        00:00:00 rsyslogd
                  

                    2964 ?        00:00:00 rsyslogd
                  

                    2964 ?        00:00:00 rsyslogd
                  

                    2980 ?        00:00:00 startpar
                  

                    2980 ?        00:00:00 startpar
                  

                    2981 ?        00:00:00 ctlscript.sh
                  

                    2981 ?        00:00:00 ctlscript.sh
                  

                    3016 ?        00:00:00 cron
                  

                    3016 ?        00:00:00 cron
                  

                    3025 ?        00:00:00 mysqld_safe
                  

                    3025 ?        00:00:00 mysqld_safe
                  

                    3032 ?        00:00:00 sshd
                  

                    3032 ?        00:00:00 sshd
                  

                    3097 ?        00:00:00 mysqld.bin
                  

                    3097 ?        00:00:00 mysqld.bin
                  

                    3097 ?        00:00:00 mysqld.bin
                  

                    3097 ?        00:00:00 mysqld.bin
                  

                    3097 ?        00:00:00 mysqld.bin
                  

                    3097 ?        00:00:00 mysqld.bin
                  

                    3097 ?        00:00:00 mysqld.bin
                  

                    3097 ?        00:00:00 mysqld.bin
                  

                    3097 ?        00:00:00 mysqld.bin
                  

                    3097 ?        00:00:00 mysqld.bin
                  

                    3113 ?        00:00:00 ctl.sh
                  

                    3113 ?        00:00:00 ctl.sh
                  

                    3115 ?        00:00:00 sleep
                  

                    3115 ?        00:00:00 sleep
                  

                    3116 ?        00:00:00 .memcached.bin
                  

                    3116 ?        00:00:00 .memcached.bin
                  

                  As you can see, it is clear that the newer kernel is
                  setting -17 by
                  

                  default, which in turn is causing the OOM killer loop.
                  

                  So I began to try and find what may have caused this
                  problem by
                  

                  comparing the two sources...
                  

                  I checked the code for all references to 'oom_adj' and
                  'oom_adjust' in
                  

                  both code sets, but found no obvious differences:
                  

                  grep -R -e oom_adjust -e oom_adj . | sort | grep -R -e
                  oom_adjust -e
                  

                  oom_adj
                  

                  Then I checked for references to "-17" in all .c and
                  .h files, and
                  

                  found a couple of matches, but only one obvious one:
                  

                  grep -R "\-17" . | grep -e ".c:" -e ".h:" -e "\-17" |
                  wc -l
                  

                  ./include/linux/oom.h:#define OOM_DISABLE (-17)
                  

                  But again, a search for OOM_DISABLE came up with
                  nothing obvious...
                  

                  In a last ditch attempt, I did a search for all
                  references to 'oom'
                  

                  (case-insensitive) in both code bases, then compared
                  the two:
                  

                    root@annabelle [~/lol/linux-2.6.32.28]>  grep -i
                  -R "oom" . | sort -n
                  

                  /tmp/annabelle.oom_adj
                    

                    root@vicky
                  [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  grep
                  -i -R
                  

                  "oom" . | sort -n>  /tmp/vicky.oom_adj
                  

                  and this brought back (yet again) nothing obvious..
                  

                    root@vicky
                  [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
                  

                  ./include/linux/oom.h
                  

                  2a32622f6cd38299fc2801d10a9a3ea8 
                  ./include/linux/oom.h
                  

                    root@annabelle [~/lol/linux-2.6.32.28]>  md5sum
                  ./include/linux/oom.h
                  

                  2a32622f6cd38299fc2801d10a9a3ea8 
                  ./include/linux/oom.h
                  

                    root@vicky
                  [/mnt/encstore/ssd/kernel/linux-2.6.32.41]>  md5sum
                  

                  ./mm/oom_kill.c
                  

                  1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
                  

                    root@annabelle [~/lol/linux-2.6.32.28]>  md5sum
                  ./mm/oom_kill.c
                  

                  1ef2c2bec19868d13ec66ec22033f10a  ./mm/oom_kill.c
                  

                  Could anyone please shed some light as to why the
                  default oom_adj is
                  

                  set to -17 now (and where it is actually set)? From
                  what I can tell,
                  

                  the fix for this issue will either be:
                  

                     1. Allow OOM killer to override the decision of
                  ignoring oom_adj ==
                  

                        -17 if an unrecoverable loop is encountered.
                  

                     2. Change the default back to 0.
                  

                  Again, my apologies if this bug report is slightly
                  unorthodox, or
                  

                  doesn't follow usual procedure etc. I can assure you I
                  have tried my
                  

                  absolute best to give all the necessary information
                  though.
                  

                  Cal
                  

                -- 

                To unsubscribe from this list: send the line
                "unsubscribe linux-kernel" in
                

                the body of a message to majordomo@xxxxxxxxxxxxxxx
                

                More majordomo info at 
                http://vger.kernel.org/majordomo-info.html
                

                Please read the FAQ at  http://www.tux.org/lkml/