Re: System freezes when RAM is full (64-bit)

Ivan Danov <huhavel@xxxxxxxxx> · Fri, 12 Apr 2013 14:38:00 +0200

On 12/04/13 13:11, Michal Hocko wrote:
On Fri 12-04-13 12:49:30, Ivan Danov wrote:
$ cat /proc/sys/vm/swappiness
60
OK, thanks for confirming this. It is really strange that we do not swap
almost at all, then.

I have increased my swap partition from nearly 2GB to around 16GB,
but the problem remains.
Increasing the swap partition will not help much as it almost unused
with 2G already (at least last data shown that).

Here I attach the logs for the larger swap partition. I use a MATLAB
script to simulate the problem, but it also works in Octave:
X = ones(100000,10000);
AFAIU this will create a matrix with 10^9 elements and initialize them
to 1. I am not familiar with octave but do you happen to know what is
the data type used for the element? 8B? It would be also interesting to
know how is the matrix organized and initialized. Does it fit into
memory at all?
Yes, 8B each, so it will be almost 8GB and it should fit into the 
memory. I don't know details how it actually works, but if it cannot 
create the matrix, MATLAB complains about that. Since it starts 
complaining even after 2000 more in the second dimension, maybe it needs 
the RAM to create it all. However on the desktop machine, both RAM and 
swap are being used (quite a lot of them both).

I have tried to simulate the problem on a desktop installation with
4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then
upgraded to 12.04, the problem isn't there, but the input is still
quite choppy during the load. After the script finishes, everything
looks fine. For the desktop installation the hard drive is not an
SSD hard drive.
What is the kernel version used here?
$ uname -a
Linux ivan 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux

On 12/04/13 12:20, Michal Hocko wrote:
[CCing Mel and Johannes]
On Fri 05-04-13 22:41:37, Ivan Danov wrote:
Here you can find attached the script, collecting the logs and the
logs themselves during the described process of freezing. It
appeared that the previous logs are corrupted, because both
/proc/vmstat and /proc/meminfo have been logging to the same file.
Sorry for the late reply:
$ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}'
min:165256 max:3254516 avg:1642475.35

So the free memory dropped down to 165M at minimum. This doesn't sound
terribly low and the average free memory was even above 1.5G. But maybe
the memory consumption peak was very short between 2 measured moments.

The peak seems to be around this time:
meminfo.1365194083:MemFree:          650792 kB
meminfo.1365194085:MemFree:          664920 kB
meminfo.1365194087:MemFree:          165256 kB  <<<
meminfo.1365194089:MemFree:          822968 kB
meminfo.1365194094:MemFree:          666940 kB

Let's have a look at the memory reclaim activity
vmstat.1365194085:pgscan_kswapd_dma32 760
vmstat.1365194085:pgscan_kswapd_normal 10444

vmstat.1365194087:pgscan_kswapd_dma32 760
vmstat.1365194087:pgscan_kswapd_normal 10444

vmstat.1365194089:pgscan_kswapd_dma32 5855
vmstat.1365194089:pgscan_kswapd_normal 80621

vmstat.1365194094:pgscan_kswapd_dma32 54333
vmstat.1365194094:pgscan_kswapd_normal 285562

[...]
vmstat.1365194098:pgscan_kswapd_dma32 54333
vmstat.1365194098:pgscan_kswapd_normal 285562

vmstat.1365194100:pgscan_kswapd_dma32 55760
vmstat.1365194100:pgscan_kswapd_normal 289493

vmstat.1365194102:pgscan_kswapd_dma32 55760
vmstat.1365194102:pgscan_kswapd_normal 289493

So the background reclaim was active only twice for a short amount of
time:
- 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone
- 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone

The second one looks sane so we can ignore it for now but the first one
scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd
had hard time to find something to reclaim or it couldn't cope with the
ongoing memory pressure.

vmstat.1365194087:pgsteal_kswapd_dma32 373
vmstat.1365194087:pgsteal_kswapd_normal 9057

vmstat.1365194089:pgsteal_kswapd_dma32 3249
vmstat.1365194089:pgsteal_kswapd_normal 56756

vmstat.1365194094:pgsteal_kswapd_dma32 14731
vmstat.1365194094:pgsteal_kswapd_normal 221733

...087-...089
	- dma32 scanned 5095, reclaimed 0
	- normal scanned 70177, reclaimed 0
...089-...094
	-dma32 scanned 48478, reclaimed 2876
	- normal scanned 204941, reclaimed 164977

This shows that kswapd was not able to reclaim any page at first and
then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in
dma32 and 80% for normal) although normal zone seems to be doing much
better.

The direct reclaim was active during that time as well:
vmstat.1365194089:pgscan_direct_dma32 0
vmstat.1365194089:pgscan_direct_normal 0

vmstat.1365194094:pgscan_direct_dma32 29339
vmstat.1365194094:pgscan_direct_normal 86869

which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed:

vmstat.1365194089:pgsteal_direct_dma32 0
vmstat.1365194089:pgsteal_direct_normal 0

vmstat.1365194094:pgsteal_direct_dma32 6137
vmstat.1365194094:pgsteal_direct_normal 57677

225M in the normal zone but it was still not effective very much (~20%
for dma32 and 66% for normal).

vmstat.1365194087:nr_written 9013
vmstat.1365194089:nr_written 9013
vmstat.1365194094:nr_written 15387

Only around 24M have been written out during the massive scanning.

So we have two problems here I guess. First is that there is not much
reclaimable memory when the peak consumption starts and then we have
hard times to balance dma32 zone.

vmstat.1365194087:nr_shmem 103548
vmstat.1365194089:nr_shmem 102227
vmstat.1365194094:nr_shmem 100679

This tells us that you didn't have that many shmem pages allocated at
the time (only 404M). So the /tmp backed by tmpfs shouldn't be the
primary issue here.

We still have a lot of anonymous memory though:
vmstat.1365194087:nr_anon_pages 1430922
vmstat.1365194089:nr_anon_pages 1317009
vmstat.1365194094:nr_anon_pages 1540460

which is around 5.5G. It is interesting that the number of these pages
even drops first and then starts growing again (between 089..094 by 870M
while we reclaimed around the same amount). This would suggest that the
load started trashing on swap but:

meminfo.1365194087:SwapFree:        1999868 kB
meminfo.1365194089:SwapFree:        1999808 kB
meminfo.1365194094:SwapFree:        1784544 kB

tells us that we swapped out only 210M after 1365194089. So we had to
reclaim a lot of page cache during that time while the anonymous memory
pressure was really high.

vmstat.1365194087:nr_file_pages 428632
vmstat.1365194089:nr_file_pages 378132
vmstat.1365194094:nr_file_pages 192009

the page cache pages dropped down by 920M which covers the anon increase.

This all suggests that the workload is simply too aggressive and the
memory reclaim doesn't cope with it.

Let's check the active and inactive lists (maybe we are not aging pages properly):
meminfo.1365194087:Active(anon):    5613412 kB
meminfo.1365194087:Active(file):     261180 kB

meminfo.1365194089:Active(anon):    4794472 kB
meminfo.1365194089:Active(file):     348396 kB

meminfo.1365194094:Active(anon):    5424684 kB
meminfo.1365194094:Active(file):      77364 kB

meminfo.1365194087:Inactive(anon):   496092 kB
meminfo.1365194087:Inactive(file):  1033184 kB

meminfo.1365194089:Inactive(anon):   853608 kB
meminfo.1365194089:Inactive(file):   749564 kB

meminfo.1365194094:Inactive(anon):  1313648 kB
meminfo.1365194094:Inactive(file):    82008 kB

While the file LRUs looks good active Anon LRUs seem to be too big. This
either suggests a bug in aging or the working set is really that big.
Considering the previous data (increase during the memory pressure) I
would be inclined to the second option.

Just out of curiosity what is the vm_swappiness setting? You've said
that you have changed that from 0 but it seems like we would swap much
more. It almost looks like the swappiness is 0. Could you double check?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>