Dear Mulyadi,

Thank you for your response. Sorry for top posting it, I was waiting for my posting to arrive in my mail-box but it took for ever and hence top posted in eagerness. You are right in your observation that I couldn't possibly have had my first application crash at 6GB. I should have said about 15GB. I have many nodes I just picked the outputs from couple and presented by observation. Below here I will try to dissect my observation in the hope you can help me understand this, my OS concepts have become a little and haven't been in touch with them. 

Here is a machine that Currently has this state:
$ free -g
             total       used       free     shared    buffers     cached
Mem:            23         14          8          0          0          0
-/+ buffers/cache:         14          9
Swap:            0          0          0

I have a program that just globs memory here is what happens when I run this:
$ ./eatmemory 8.99G
Eating 8589934592 bytes in chunks of 1024...
Done, press any key to free the memory

$ ./eatmemory 9G
Eating 9663676416 bytes in chunks of 1024...

I believe the above observation is nothing wrong, because RAM is used by what other(assuming running) applications and I only have so much available for my program to run.

But my issue is nothing else other than system services are running on this machine, this renders this node un-usable for the next program that runs on this machine and when request more than what 9G as above. Below here is the output of /proc/meminfo from the same machine

$ cat /proc/meminfo
MemTotal:       24724728 kB
MemFree:         9402768 kB
Buffers:               0 kB
Cached:           217464 kB
SwapCached:            0 kB
Active:         14650896 kB
Inactive:          60456 kB
Active(anon):   14647052 kB
Inactive(anon):    40632 kB
Active(file):       3844 kB
Inactive(file):    19824 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:      14493928 kB
Mapped:            19544 kB
Shmem:            193720 kB
Slab:             109720 kB
SReclaimable:      12300 kB
SUnreclaim:        97420 kB
KernelStack:        2968 kB
PageTables:        39100 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12362364 kB
Committed_AS:   15684044 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      493316 kB
VmallocChunk:   34346062668 kB
HardwareCorrupted:     0 kB
AnonHugePages:  13936640 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7652 kB
DirectMap2M:    25145344 kB

Also here is my ulimit which is unlimited:
$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 192912
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 81920
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

And /proc/self/maps
$ cat /proc/self/maps
00400000-0040b000 r-xp 00000000 00:10 67455633                           /bin/cat
0060a000-0060b000 rw-p 0000a000 00:10 67455633                           /bin/cat
0060b000-0060c000 rw-p 00000000 00:00 0
0080a000-0080b000 rw-p 0000a000 00:10 67455633                           /bin/cat
0209f000-020c0000 rw-p 00000000 00:00 0                                  [heap]
36d7e00000-36d7e20000 r-xp 00000000 00:10 67454760                       /lib64/
36d801f000-36d8020000 r--p 0001f000 00:10 67454760                       /lib64/
36d8020000-36d8021000 rw-p 00020000 00:10 67454760                       /lib64/
36d8021000-36d8022000 rw-p 00000000 00:00 0
36d8200000-36d838a000 r-xp 00000000 00:10 67456999                       /lib64/
36d838a000-36d858a000 ---p 0018a000 00:10 67456999                       /lib64/
36d858a000-36d858e000 r--p 0018a000 00:10 67456999                       /lib64/
36d858e000-36d858f000 rw-p 0018e000 00:10 67456999                       /lib64/
36d858f000-36d8594000 rw-p 00000000 00:00 0
7f754caad000-7f754cab0000 rw-p 00000000 00:00 0
7f754cac2000-7f754cac3000 rw-p 00000000 00:00 0
7fff5e496000-7fff5e4ab000 rw-p 00000000 00:00 0                          [stack]
7fff5e5f8000-7fff5e5f9000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

On every machine i ran into this problem, anonpages are eating up the memory,in effect shrinking the available RAM for the programs to run. 
Q) Now my question is since the previous job/program that ran on this machines has finished or died: My OS concepts tell me that the recently used cached-anonpages will be released to meet the request of another application requesting to use up the memory/vm. What am I missing here to understand?

Also what I fail to understand is the state in which my diskelss & swapless nodes remain: What/who has control over the used up memory, why is it not being granted for the next owner of the machine to run at full scale? I understand that I will not have all of it but at least 19GB out of 24GB. Also below is the list of top process on the machines: Looking at it I don't see any heave use of memory  ...mystery make me feel dumb??

$ ps aux --sort -rss
root      8402  0.0  0.0 119712 15896 ?        S    12:11   0:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root      9555  0.0  0.0 3508796 4680 ?        S    Aug18   0:33 /usr/sbin/slurmd
useralap   8231  0.0  0.0  27224  4604 pts/0    S    12:06   0:00 -bash
root      8401  0.0  0.0 151052  4264 ?        S    12:11   0:00 /usr/libexec/sssd/sssd_be --domain default --uid 0 --gid 0 --debug-to-files
root      8153  0.0  0.0 111192  3240 pts/0    Ss   12:05   0:00 -bash
root      2078  0.0  0.0 720600  2968 ?        Ssl  Aug10   1:39 automount --pid-file /var/run/
root      1752  0.0  0.0 249344  2784 ?        Sl   Aug10   0:04 /sbin/rsyslogd -i /var/run/ -c 5
useralap   9898  1.0  0.0  26196  1468 pts/0    R+   15:42   0:00 ps aux --sort -rss
root      8150  0.0  0.0 111816  1296 ?        Ss   12:05   0:00 sshd: root@pts/0
munge     2146  0.0  0.0 225004  1292 ?        Sl   Aug10   0:36 /usr/sbin/munged
68        2006  0.0  0.0  41976  1228 ?        Ssl  Aug10   0:10 hald
root      1671  0.0  0.0   9120   976 ?        Ss   Aug10   0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-em1.leases -pf /var/run/
root      8400  0.0  0.0 114288   900 ?        Ss   12:11   0:00 /usr/sbin/sssd -f -D
root      8403  0.0  0.0 105264   876 ?        S    12:11   0:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
root      2174  0.0  0.0  20000   868 ?        Ss   Aug10   0:42 crond
root      2111  0.0  0.0  66188   712 ?        Ss   Aug10   0:00 /usr/sbin/sshd
root      3625  0.0  0.0 334616   648 ?        SLsl Aug10   0:00 /usr/sbin/ibacm
root       863  0.0  0.0  10832   592 ?        S<s  Aug10   0:00 /sbin/udevd -d
root      3391  0.0  0.0  10828   588 ?        S<   Aug10   0:00 /sbin/udevd -d
root      3523  0.0  0.0  10828   588 ?        S<   Aug10   0:00 /sbin/udevd -d
rpcuser   1893  0.0  0.0  25428   464 ?        Ss   Aug10   0:00 rpc.statd
root      1736  0.0  0.0  93176   460 ?        S<sl Aug10   0:07 auditd
root         1  0.0  0.0  23500   452 ?        Ss   Aug10   0:02 /sbin/init
root      1799  0.0  0.0  10912   452 ?        Ss   Aug10   6:45 irqbalance --pid=/var/run/
root      8230  0.0  0.0 165156   448 pts/0    S    12:06   0:00 su - useralap
rpc       1875  0.0  0.0  18976   300 ?        Ss   Aug10   0:02 rpcbind
dbus      1934  0.0  0.0  23484   280 ?        Ss   Aug10   0:00 dbus-daemon --system
root      2199  0.0  0.0  21076   212 ?        Ss   Aug10   0:00 /usr/sbin/atd
root      2207  0.0  0.0  21792   212 ?        S    Aug10   0:24 /usr/sbin/ipmievd sel pidfile=/var/run/
root      2007  0.0  0.0  20400   184 ?        S    Aug10   0:00 hald-runner
root      2043  0.0  0.0  22520   164 ?        S    Aug10   0:00 hald-addon-input: Listening on /dev/input/event0
68        2045  0.0  0.0  18008   148 ?        S    Aug10   0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root      1997  0.0  0.0   4080   116 ?        Ss   Aug10   0:00 /usr/sbin/acpid
root      2097  0.0  0.0   6260   116 ?        Ss   Aug10   0:00 /usr/sbin/mcelog --daemon
root      2222  0.0  0.0   4064    76 tty2     Ss+  Aug10   0:00 /sbin/mingetty /dev/tty2

Please advise and let me know if you need more information.
-best regards!!

Dear All,

I have done quite a bit of reading on Active memory reported in /proc/meminfo and in short says it is never reclaimed unless absolutely necessary, and it caches the recently used files/pages in memory. Although I fail to understand the consequences that I face here. 

I have disk-less and swap-less nodes. So all I have to do, is play with the RAM on the box. Issue that brought me here is investigating why after running some applications, used memory is never available for use with any other applications. 

In other words I cannot run any programs that requests memory more than what is shown as free in the output of free command and MemFree in the output of the cat /proc/meminfo
For example if I ran any program that requires more than 6GB on the first node below and more than 1GB on the second node below they fail instantly, and work fine if within the limist of free. There is nothing else running on the system other than system processes/services. 

             total       used       free     shared    buffers     cached
Mem:            23         17          6          0          0          9
-/+ buffers/cache:          8         15
Swap:            0          0          0

             total       used       free     shared    buffers     cached
Mem:            23         22          1          0          0          0
-/+ buffers/cache:         21          1
Swap:            0          0          0

Since the applications that ran previously are not running any more "even though they died out of memory because they requested more memory than available", shouldn't the OS see that any memory used previously as useless and can it not reclaim that for use with the next job/program on that machine. 

On every machine that I have run into this problem the out put of /proc/meminfo shows that Active memory is used up the amount shown in the free command and limits my further runs. 

This is driving me insane and making me feel stupid knowing that OS is smart enough to handle this, then what am I missing here to understand? Please advise. 

Appreciate any insight into this. 

Best Regards,

Dear Prem

welcome to kernelnewbies :) First of all, please don't do top posting when replying. Follow like what I and the rest of list member do.

Btw, looking from the free output, I have a doubt about your statement that your first application took 6 GB and secondly it took 1 GB. Assuming your application doesn't thing like memory locking in kernel space, i guess it takes 20+ GB of RAM.

So, before we go further, could you re run your applications and use ps or top to see both the VSIZE and RSS they take ?

Regarding memory claiming, yes after app is killed (using any ways possible: ctrl-c, sending kill/term/quit signal, OOM etc), any memory allocated by this task are freed. It happen on both active and inactive pages


Mulyadi Santosa
Freelance Linux trainer and consultant


