Le 18.08.2012 10:29, Alex Bligh a écrit :
--On 18 August 2012 10:04:12 +0200 Denis BUCHER <dbucherml@xxxxxxxxxxxxx> wrote:And whatever page it is, is there some explanation why apache can takes all the server memory ? PHP is limited, so how is it possible for Apache to do that ?Right, but it's taking lots of virtual memory, and that appears to be larger than your physical memory.
Yes the process is using SWAP (it takes all physical memory and all SWAP just for itself !)
The pmap manpage is woefully short but I think that's what it's saying. That itself isn't a problem. Perhaps it's just mmap'ing a large file, for instance? Or allocating 3G of virtual memory with an anonymous mmap (which might be what '[anon]' means). Assuming you are on a 64 bit server, you are never likely to run out of virtual memory.
I don't know what you mean with "just mmap'ing a large file" ? I mean whatever it is, it's a HUGE problem !
So, is it writing to that an actually using huge quantities of physical memory? IE does RSIZE on 'top' grow huge?
How can I see RSIZE in top ? I don't see it ?
But when it is happening, %MEM is something like 99% in top :
top - 10:57:33 up 51 days, 16:17, 2 users, load average: 1.29, 0.34, 0.16
Tasks: 142 total, 1 running, 140 sleeping, 1 stopped, 0 zombie
Cpu(s): 1.6%us, 3.7%sy, 0.0%ni, 0.0%id, 94.3%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 4041804k total, 4020816k used, 20988k free, 368k buffers
Swap: 4192956k total, 857500k used, 3335456k free, 36332k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25178 apache 18 0 4109m 3.0g 2968 D 7.0 78.8 0:09.13 httpd
234 root 10 -5 0 0 0 D 3.0 0.0 203:22.33 kswapd0
1 root 15 0 10364 80 56 S 0.0 0.0 0:10.65 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.05 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:00.05 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:00.12 ksoftirqd/1
7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
8 root 10 -5 0 0 0 S 0.0 0.0 0:00.07 events/0
9 root 10 -5 0 0 0 S 0.0 0.0 0:00.26 events/1
10 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
If not, I wouldn't worry.
???!!!! Maybe I forgot to tell you that the server is COMPLETELY DOWN AND UNRESPONDING for two minutes every five minutes, nobody is able to it, neither users, neither the employees and we can't even type a letter in SSH during all this time !!! All webpages come with "connection reset" or with no answer at all... In fact, I am _extremely_ worried !
If so, I'd guess something is allocating a large or several large objects, as IIRC libc uses anonymous mmap rather than brk only for large allocations. I wouldn't necessarily trust php_memory_limit. So, either a memory leak in php, apache or the postgres client, or you have a script sending back a huge reply, or a DB query with a huge reply, or something else. What I'd do is wait until it happens again and strace the pid concerned and see what it's doing.
Yes it's very easy to make it happen again, I just need to remove Google Bots IPs from firewall DROP list ;-)
But how can I strace it ?
I really don't know how to do it ?
Thanks a lot for your help, it's very nice !
Denis