Dear Mulyadi,
Thank you for your response. Sorry for top posting it, I was waiting for my posting to arrive in my mail-box but it took for ever and hence top posted in eagerness. You are right in your observation that I couldn't possibly have had my first application crash at 6GB. I should have said about 15GB. I have many nodes I just picked the outputs from couple and presented by observation. Below here I will try to dissect my observation in the hope you can help me understand this, my OS concepts have become a little and haven't been in touch with them.
Here is a machine that Currently has this state:
$ free -g
total used free shared buffers cached
Mem: 23 14 8 0 0 0
-/+ buffers/cache: 14 9
Swap: 0 0 0
I have a program that just globs memory here is what happens when I run this:
$ ./eatmemory 8.99G
Eating 8589934592 bytes in chunks of 1024...
Done, press any key to free the memory
$ ./eatmemory 9G
Eating 9663676416 bytes in chunks of 1024...
Killed
I believe the above observation is nothing wrong, because RAM is used by what other(assuming running) applications and I only have so much available for my program to run.
But my issue is nothing else other than system services are running on this machine, this renders this node un-usable for the next program that runs on this machine and when request more than what 9G as above. Below here is the output of /proc/meminfo from the same machine
$ cat /proc/meminfo
MemTotal: 24724728 kB
MemFree: 9402768 kB
Buffers: 0 kB
Cached: 217464 kB
SwapCached: 0 kB
Active: 14650896 kB
Inactive: 60456 kB
Active(anon): 14647052 kB
Inactive(anon): 40632 kB
Active(file): 3844 kB
Inactive(file): 19824 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 14493928 kB
Mapped: 19544 kB
Shmem: 193720 kB
Slab: 109720 kB
SReclaimable: 12300 kB
SUnreclaim: 97420 kB
KernelStack: 2968 kB
PageTables: 39100 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 12362364 kB
Committed_AS: 15684044 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 493316 kB
VmallocChunk: 34346062668 kB
HardwareCorrupted: 0 kB
AnonHugePages: 13936640 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 7652 kB
DirectMap2M: 25145344 kB
Also here is my ulimit which is unlimited:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 192912
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 81920
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
And /proc/self/maps
$ cat /proc/self/maps
00400000-0040b000 r-xp 00000000 00:10 67455633 /bin/cat
0060a000-0060b000 rw-p 0000a000 00:10 67455633 /bin/cat
0060b000-0060c000 rw-p 00000000 00:00 0
0080a000-0080b000 rw-p 0000a000 00:10 67455633 /bin/cat
0209f000-020c0000 rw-p 00000000 00:00 0 [heap]
36d7e00000-36d7e20000 r-xp 00000000 00:10 67454760 /lib64/ld-2.12.so
36d801f000-36d8020000 r--p 0001f000 00:10 67454760 /lib64/ld-2.12.so
36d8020000-36d8021000 rw-p 00020000 00:10 67454760 /lib64/ld-2.12.so
36d8021000-36d8022000 rw-p 00000000 00:00 0
36d8200000-36d838a000 r-xp 00000000 00:10 67456999 /lib64/libc-2.12.so
36d838a000-36d858a000 ---p 0018a000 00:10 67456999 /lib64/libc-2.12.so
36d858a000-36d858e000 r--p 0018a000 00:10 67456999 /lib64/libc-2.12.so
36d858e000-36d858f000 rw-p 0018e000 00:10 67456999 /lib64/libc-2.12.so
36d858f000-36d8594000 rw-p 00000000 00:00 0
7f754caad000-7f754cab0000 rw-p 00000000 00:00 0
7f754cac2000-7f754cac3000 rw-p 00000000 00:00 0
7fff5e496000-7fff5e4ab000 rw-p 00000000 00:00 0 [stack]
7fff5e5f8000-7fff5e5f9000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
On every machine i ran into this problem, anonpages are eating up the memory,in effect shrinking the available RAM for the programs to run.
Q) Now my question is since the previous job/program that ran on this machines has finished or died: My OS concepts tell me that the recently used cached-anonpages will be released to meet the request of another application requesting to use up the memory/vm. What am I missing here to understand?
Also what I fail to understand is the state in which my diskelss & swapless nodes remain: What/who has control over the used up memory, why is it not being granted for the next owner of the machine to run at full scale? I understand that I will not have all of it but at least 19GB out of 24GB. Also below is the list of top process on the machines: Looking at it I don't see any heave use of memory ...mystery make me feel dumb??
$ ps aux --sort -rss
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 8402 0.0 0.0 119712 15896 ? S 12:11 0:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root 9555 0.0 0.0 3508796 4680 ? S Aug18 0:33 /usr/sbin/slurmd
useralap 8231 0.0 0.0 27224 4604 pts/0 S 12:06 0:00 -bash
root 8401 0.0 0.0 151052 4264 ? S 12:11 0:00 /usr/libexec/sssd/sssd_be --domain default --uid 0 --gid 0 --debug-to-files
root 8153 0.0 0.0 111192 3240 pts/0 Ss 12:05 0:00 -bash
root 2078 0.0 0.0 720600 2968 ? Ssl Aug10 1:39 automount --pid-file /var/run/autofs.pid
root 1752 0.0 0.0 249344 2784 ? Sl Aug10 0:04 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
useralap 9898 1.0 0.0 26196 1468 pts/0 R+ 15:42 0:00 ps aux --sort -rss
root 8150 0.0 0.0 111816 1296 ? Ss 12:05 0:00 sshd: root@pts/0
munge 2146 0.0 0.0 225004 1292 ? Sl Aug10 0:36 /usr/sbin/munged
68 2006 0.0 0.0 41976 1228 ? Ssl Aug10 0:10 hald
root 1671 0.0 0.0 9120 976 ? Ss Aug10 0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-em1.leases -pf /var/run/dhclient-em1.pid
root 8400 0.0 0.0 114288 900 ? Ss 12:11 0:00 /usr/sbin/sssd -f -D
root 8403 0.0 0.0 105264 876 ? S 12:11 0:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
root 2174 0.0 0.0 20000 868 ? Ss Aug10 0:42 crond
root 2111 0.0 0.0 66188 712 ? Ss Aug10 0:00 /usr/sbin/sshd
root 3625 0.0 0.0 334616 648 ? SLsl Aug10 0:00 /usr/sbin/ibacm
root 863 0.0 0.0 10832 592 ? S<s Aug10 0:00 /sbin/udevd -d
root 3391 0.0 0.0 10828 588 ? S< Aug10 0:00 /sbin/udevd -d
root 3523 0.0 0.0 10828 588 ? S< Aug10 0:00 /sbin/udevd -d
rpcuser 1893 0.0 0.0 25428 464 ? Ss Aug10 0:00 rpc.statd
root 1736 0.0 0.0 93176 460 ? S<sl Aug10 0:07 auditd
root 1 0.0 0.0 23500 452 ? Ss Aug10 0:02 /sbin/init
root 1799 0.0 0.0 10912 452 ? Ss Aug10 6:45 irqbalance --pid=/var/run/irqbalance.pid
root 8230 0.0 0.0 165156 448 pts/0 S 12:06 0:00 su - useralap
rpc 1875 0.0 0.0 18976 300 ? Ss Aug10 0:02 rpcbind
dbus 1934 0.0 0.0 23484 280 ? Ss Aug10 0:00 dbus-daemon --system
root 2199 0.0 0.0 21076 212 ? Ss Aug10 0:00 /usr/sbin/atd
root 2207 0.0 0.0 21792 212 ? S Aug10 0:24 /usr/sbin/ipmievd sel pidfile=/var/run/ipmievd.pid
root 2007 0.0 0.0 20400 184 ? S Aug10 0:00 hald-runner
root 2043 0.0 0.0 22520 164 ? S Aug10 0:00 hald-addon-input: Listening on /dev/input/event0
68 2045 0.0 0.0 18008 148 ? S Aug10 0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root 1997 0.0 0.0 4080 116 ? Ss Aug10 0:00 /usr/sbin/acpid
root 2097 0.0 0.0 6260 116 ? Ss Aug10 0:00 /usr/sbin/mcelog --daemon
root 2222 0.0 0.0 4064 76 tty2 Ss+ Aug10 0:00 /sbin/mingetty /dev/tty2
Please advise and let me know if you need more information.
-best regards!!
On Wed, Sep 23, 2015 at 11:07 AM, Mulyadi Santosa <mulyadi.santosa@xxxxxxxxx> wrote:
On Wed, Sep 23, 2015 at 4:47 AM, Prem Kumar <prem.it.kumar@xxxxxxxxx> wrote:also wondering if there is a way I can list Active memory map showing me what is cached?-regards.On Tue, Sep 22, 2015 at 3:08 PM, Prem Kumar <prem.it.kumar@xxxxxxxxx> wrote:Dear All,I have done quite a bit of reading on Active memory reported in /proc/meminfo and in short says it is never reclaimed unless absolutely necessary, and it caches the recently used files/pages in memory. Although I fail to understand the consequences that I face here.I have disk-less and swap-less nodes. So all I have to do, is play with the RAM on the box. Issue that brought me here is investigating why after running some applications, used memory is never available for use with any other applications.In other words I cannot run any programs that requests memory more than what is shown as free in the output of free command and MemFree in the output of the cat /proc/meminfoFor example if I ran any program that requires more than 6GB on the first node below and more than 1GB on the second node below they fail instantly, and work fine if within the limist of free. There is nothing else running on the system other than system processes/services.total used free shared buffers cachedMem: 23 17 6 0 0 9-/+ buffers/cache: 8 15Swap: 0 0 0total used free shared buffers cachedMem: 23 22 1 0 0 0-/+ buffers/cache: 21 1Swap: 0 0 0Since the applications that ran previously are not running any more "even though they died out of memory because they requested more memory than available", shouldn't the OS see that any memory used previously as useless and can it not reclaim that for use with the next job/program on that machine.On every machine that I have run into this problem the out put of /proc/meminfo shows that Active memory is used up the amount shown in the free command and limits my further runs.This is driving me insane and making me feel stupid knowing that OS is smart enough to handle this, then what am I missing here to understand? Please advise.Appreciate any insight into this.Best Regards,PremDear Premwelcome to kernelnewbies :) First of all, please don't do top posting when replying. Follow like what I and the rest of list member do.Btw, looking from the free output, I have a doubt about your statement that your first application took 6 GB and secondly it took 1 GB. Assuming your application doesn't thing like memory locking in kernel space, i guess it takes 20+ GB of RAM.So, before we go further, could you re run your applications and use ps or top to see both the VSIZE and RSS they take ?Regarding memory claiming, yes after app is killed (using any ways possible: ctrl-c, sending kill/term/quit signal, OOM etc), any memory allocated by this task are freed. It happen on both active and inactive pages
--regards,
Mulyadi Santosa
Freelance Linux trainer and consultant
blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com
_______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies