(Resending as the updated patch 2 appears to have gotten lost in a "twisty maze of threads all similar" while questing towards mmotm) Changelog since V3 o cond_resched in shrink_slab when it does nothing rather than having kswapd sleep for HZ/10 when it needs to schedule Changelog since V2 o Drop all SLUB latency-reducing patches. Changelog since V1 o kswapd should sleep if need_resched o Remove __GFP_REPEAT from GFP flags when speculatively using high orders so direct/compaction exits earlier o Remove __GFP_NORETRY for correctness o Correct logic in sleeping_prematurely o Leave SLUB using the default slub_max_order There are a few reports of people experiencing hangs when copying large amounts of data with kswapd using a large amount of CPU which appear to be due to recent reclaim changes. SLUB using high orders is the trigger but not the root cause as SLUB has been using high orders for a while. The root cause was bugs introduced into reclaim which are addressed by the following two patches. Patch 1 corrects logic introduced by commit [1741c877: mm: kswapd: keep kswapd awake for high-order allocations until a percentage of the node is balanced] to allow kswapd to go to sleep when balanced for high orders. Patch 2 notes that it is possible for kswapd to miss every cond_resched() and updates shrink_slab() so it'll at least reach that scheduling point. Chris Wood reports that these two patches in isolation are sufficient to prevent the system hanging. AFAIK, they should also resolve similar hangs experienced by James Bottomley. These should be also considered for -stable for both 2.6.38 and 2.6.39. -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html