On Tue, Jun 21, 2011 at 10:53:02AM +0100, P?draig Brady wrote: > I tried the 2 patches here to no avail: > http://marc.info/?l=linux-mm&m=130503811704830&w=2 > > I originally logged this at: > https://bugzilla.redhat.com/show_bug.cgi?id=712019 > > I can compile up and quickly test any suggestions. > I recently looked through what kswapd does and there are a number of problem areas. Unfortunately, I haven't gotten around to doing anything about it yet or running the test cases to see if they are really problems. In your case, the following is a strong possibility though. This should be applied on top of the two patches merged from that thread. This is not tested in any way, based on 3.0-rc3 ==== CUT HERE ==== mm: vmscan: Stop looping in kswapd if high-order reclaim is failing A number of people have identified a problem whereby kswapd consumes 99% of CPU in a tight loop. It was determined that there are constant sources of high-order allocations but in the event the allocations are failing, kswapd continues to consume CPU and reclaim too much memory. kswapd can and does give up costly high-order reclaim but only if it is failing to make forward progress. This patch tracks how much memory kswapd has reclaimed. If it reclaims 4 times the size of the allocation request, it resets to order-0, balance for that order and will go to sleep unless there has been continued allocation requests. "4 times" is a tad arbitrary but it's down to (1<<PAGE_ALLOC_COSTLY_ORDER)*4 == SWAP_CLUSTER_MAX which is the "standard" unit of reclaim kswapd works on so scale it similarily for the higher orders. Not signed off by as it is barely a prototype --- mm/vmscan.c | 11 ++++++++++- 1 files changed, 10 insertions(+), 1 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index faa0a08..8fb262f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2376,6 +2376,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, int i; int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ unsigned long total_scanned; + unsigned long total_reclaimed; struct reclaim_state *reclaim_state = current->reclaim_state; unsigned long nr_soft_reclaimed; unsigned long nr_soft_scanned; @@ -2397,6 +2398,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, }; loop_again: total_scanned = 0; + total_reclaimed = 0; sc.nr_reclaimed = 0; sc.may_writepage = !laptop_mode; count_vm_event(PAGEOUTRUN); @@ -2564,6 +2566,7 @@ loop_again: break; } out: + total_reclaimed += sc.nr_reclaimed; /* * order-0: All zones must meet high watermark for a balanced node @@ -2584,12 +2587,18 @@ out: * little point trying all over again as kswapd may * infinite loop. * + * Similarly, if we have reclaimed far more pages than the + * original request size, it's likely that contiguous reclaim + * is not finding the pages it needs and it should give + * up. + * * Instead, recheck all watermarks at order-0 as they * are the most important. If watermarks are ok, kswapd will go * back to sleep. High-order users can still perform direct * reclaim if they wish. */ - if (sc.nr_reclaimed < SWAP_CLUSTER_MAX) + if (sc.nr_reclaimed < SWAP_CLUSTER_MAX || + (order > PAGE_ALLOC_COSTLY_ORDER && total_reclaimed > (4UL << order)) ) order = sc.order = 0; goto loop_again; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>