$ cat /proc/sys/vm/swappiness 60I have increased my swap partition from nearly 2GB to around 16GB, but the problem remains. Here I attach the logs for the larger swap partition. I use a MATLAB script to simulate the problem, but it also works in Octave:
X = ones(100000,10000);I have tried to simulate the problem on a desktop installation with 4GB of RAM, 10GB of swap partition, installed Ubuntu Lucid and then upgraded to 12.04, the problem isn't there, but the input is still quite choppy during the load. After the script finishes, everything looks fine. For the desktop installation the hard drive is not an SSD hard drive.
On 12/04/13 12:20, Michal Hocko wrote:
[CCing Mel and Johannes] On Fri 05-04-13 22:41:37, Ivan Danov wrote:Here you can find attached the script, collecting the logs and the logs themselves during the described process of freezing. It appeared that the previous logs are corrupted, because both /proc/vmstat and /proc/meminfo have been logging to the same file.Sorry for the late reply: $ grep MemFree: meminfo.1365194* | awk 'BEGIN{min=9999999}{val=$2; if(val<min)min=val; if(val>max)max=val; sum+=val; n++}END{printf "min:%d max:%d avg:%.2f\n", min, max, sum/n}' min:165256 max:3254516 avg:1642475.35 So the free memory dropped down to 165M at minimum. This doesn't sound terribly low and the average free memory was even above 1.5G. But maybe the memory consumption peak was very short between 2 measured moments. The peak seems to be around this time: meminfo.1365194083:MemFree: 650792 kB meminfo.1365194085:MemFree: 664920 kB meminfo.1365194087:MemFree: 165256 kB <<< meminfo.1365194089:MemFree: 822968 kB meminfo.1365194094:MemFree: 666940 kB Let's have a look at the memory reclaim activity vmstat.1365194085:pgscan_kswapd_dma32 760 vmstat.1365194085:pgscan_kswapd_normal 10444 vmstat.1365194087:pgscan_kswapd_dma32 760 vmstat.1365194087:pgscan_kswapd_normal 10444 vmstat.1365194089:pgscan_kswapd_dma32 5855 vmstat.1365194089:pgscan_kswapd_normal 80621 vmstat.1365194094:pgscan_kswapd_dma32 54333 vmstat.1365194094:pgscan_kswapd_normal 285562 [...] vmstat.1365194098:pgscan_kswapd_dma32 54333 vmstat.1365194098:pgscan_kswapd_normal 285562 vmstat.1365194100:pgscan_kswapd_dma32 55760 vmstat.1365194100:pgscan_kswapd_normal 289493 vmstat.1365194102:pgscan_kswapd_dma32 55760 vmstat.1365194102:pgscan_kswapd_normal 289493 So the background reclaim was active only twice for a short amount of time: - 1365194087 - 1365194094 - 53573 pages in dma32 and 275118 in normal zone - 1365194098 - 1365194100 - 1427 pages in dma32 and 3931 in normal zone The second one looks sane so we can ignore it for now but the first one scanned 1074M in normal zone and 209M in the dma32 zone. Either kswapd had hard time to find something to reclaim or it couldn't cope with the ongoing memory pressure. vmstat.1365194087:pgsteal_kswapd_dma32 373 vmstat.1365194087:pgsteal_kswapd_normal 9057 vmstat.1365194089:pgsteal_kswapd_dma32 3249 vmstat.1365194089:pgsteal_kswapd_normal 56756 vmstat.1365194094:pgsteal_kswapd_dma32 14731 vmstat.1365194094:pgsteal_kswapd_normal 221733 ...087-...089 - dma32 scanned 5095, reclaimed 0 - normal scanned 70177, reclaimed 0 ...089-...094 -dma32 scanned 48478, reclaimed 2876 - normal scanned 204941, reclaimed 164977 This shows that kswapd was not able to reclaim any page at first and then it reclaimed a lot (644M in 5s) but still very ineffectively (5% in dma32 and 80% for normal) although normal zone seems to be doing much better. The direct reclaim was active during that time as well: vmstat.1365194089:pgscan_direct_dma32 0 vmstat.1365194089:pgscan_direct_normal 0 vmstat.1365194094:pgscan_direct_dma32 29339 vmstat.1365194094:pgscan_direct_normal 86869 which scanned 29339 in dma32 and 86869 in normal zone while it reclaimed: vmstat.1365194089:pgsteal_direct_dma32 0 vmstat.1365194089:pgsteal_direct_normal 0 vmstat.1365194094:pgsteal_direct_dma32 6137 vmstat.1365194094:pgsteal_direct_normal 57677 225M in the normal zone but it was still not effective very much (~20% for dma32 and 66% for normal). vmstat.1365194087:nr_written 9013 vmstat.1365194089:nr_written 9013 vmstat.1365194094:nr_written 15387 Only around 24M have been written out during the massive scanning. So we have two problems here I guess. First is that there is not much reclaimable memory when the peak consumption starts and then we have hard times to balance dma32 zone. vmstat.1365194087:nr_shmem 103548 vmstat.1365194089:nr_shmem 102227 vmstat.1365194094:nr_shmem 100679 This tells us that you didn't have that many shmem pages allocated at the time (only 404M). So the /tmp backed by tmpfs shouldn't be the primary issue here. We still have a lot of anonymous memory though: vmstat.1365194087:nr_anon_pages 1430922 vmstat.1365194089:nr_anon_pages 1317009 vmstat.1365194094:nr_anon_pages 1540460 which is around 5.5G. It is interesting that the number of these pages even drops first and then starts growing again (between 089..094 by 870M while we reclaimed around the same amount). This would suggest that the load started trashing on swap but: meminfo.1365194087:SwapFree: 1999868 kB meminfo.1365194089:SwapFree: 1999808 kB meminfo.1365194094:SwapFree: 1784544 kB tells us that we swapped out only 210M after 1365194089. So we had to reclaim a lot of page cache during that time while the anonymous memory pressure was really high. vmstat.1365194087:nr_file_pages 428632 vmstat.1365194089:nr_file_pages 378132 vmstat.1365194094:nr_file_pages 192009 the page cache pages dropped down by 920M which covers the anon increase. This all suggests that the workload is simply too aggressive and the memory reclaim doesn't cope with it. Let's check the active and inactive lists (maybe we are not aging pages properly): meminfo.1365194087:Active(anon): 5613412 kB meminfo.1365194087:Active(file): 261180 kB meminfo.1365194089:Active(anon): 4794472 kB meminfo.1365194089:Active(file): 348396 kB meminfo.1365194094:Active(anon): 5424684 kB meminfo.1365194094:Active(file): 77364 kB meminfo.1365194087:Inactive(anon): 496092 kB meminfo.1365194087:Inactive(file): 1033184 kB meminfo.1365194089:Inactive(anon): 853608 kB meminfo.1365194089:Inactive(file): 749564 kB meminfo.1365194094:Inactive(anon): 1313648 kB meminfo.1365194094:Inactive(file): 82008 kB While the file LRUs looks good active Anon LRUs seem to be too big. This either suggests a bug in aging or the working set is really that big. Considering the previous data (increase during the memory pressure) I would be inclined to the second option. Just out of curiosity what is the vm_swappiness setting? You've said that you have changed that from 0 but it seems like we would swap much more. It almost looks like the swappiness is 0. Could you double check?
Attachment:
bug.tar.gz
Description: GNU Zip compressed data