When testing 2.6.37-rc1, I noticed that huge page allocation success rates were severely impaired. Bisection showed that commit [d065bd81: mm: retry page fault when blocking on disk transfer] was the biggest factor. Reverting the patch confirmed this. Here are the results of a high-order allocation stress test. The vanilla kernel is 2.6.37-rc1 and the revert kernel has this commit removed with minor conflicts cleaned up. STRESS-HIGHALLOC highalloc-vanilla revert-d065bd81 Pass 1 7.00 ( 0.00%) 73.00 (66.00%) Pass 2 7.00 ( 0.00%) 92.00 (85.00%) At Rest 13.00 ( 0.00%) 93.00 (80.00%) The "pass 1" and "pass 2" are allocation attempts while the machine is under heavy load. One might expect that allocations fail there but when the machine is fully at rest, all memory freed and nothing else is going on, the pages still cannot be allocated. I had ftrace enabled and found this. FTrace Reclaim Statistics: vmscan vanilla revert-d065bd81 Direct reclaims 3687 889 Direct reclaim pages scanned 39767013 195182 Direct reclaim pages reclaimed 115079 107891 Direct reclaim write file async I/O 13598 5777 Direct reclaim write anon async I/O 70886 40954 Direct reclaim write file sync I/O 0 0 Direct reclaim write anon sync I/O 37 178 Wake kswapd requests 6508 868 Kswapd wakeups 1291 521 Kswapd pages scanned 77859381 3240330 Kswapd pages reclaimed 2548099 1965881 Kswapd reclaim write file async I/O 51266 56838 Kswapd reclaim write anon async I/O 935070 392199 Kswapd reclaim write file sync I/O 0 0 Kswapd reclaim write anon sync I/O 0 0 Time stalled direct reclaim (seconds) 1160.57 636.24 Time kswapd awake (seconds) 1453.81 654.25 Total pages scanned 117626394 3435512 Total pages reclaimed 2663178 2073772 %age total pages scanned/reclaimed 2.26% 60.36% %age total pages scanned/written 0.91% 14.44% %age file pages scanned/written 0.06% 1.82% Percentage Time Spent Direct Reclaim 25.92% 15.98% Percentage Time kswapd Awake 65.57% 36.01% Reverting the commit improves overall reclaim behaviour when allocating huge pages. Note in particular the low percentage for scanned/reclaimed in the vanilla kernel which implies the vanilla kernel is endlessly scans pages it cannot reclaim. I also note that with the vanilla kernel that nr_inactive_* remains high but when the patch is reverted, it drops implying that the patch is preventing pages being reclaimed. It does not make a difference if compaction is used - the figures are still brutal. I have not digested what the patch is doing but am reporting it in case people familiar with the patch spot the problem quickly. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html