Lets see; shrink_page_list() only applies if inactive pages were isolated which in turn may not happen if all_unreclaimable is set in shrink_zones(). If for whatver reason, all_unreclaimable is set on all zones, we can miss calling cond_resched(). shrink_slab only applies if we are reclaiming slab pages. If the first shrinker returns -1, we do not call cond_resched(). If that first shrinker is dcache and __GFP_FS is not set, direct reclaimers will not shrink at all. However, if there are enough of them running or if one of the other shrinkers is running for a very long time, kswapd could be starved acquiring the shrinker_rwsem and never reaching the cond_resched().
OK.
balance_pgdat() only calls cond_resched if the zones are not balanced. For a high-order allocation that is balanced, it checks order-0 again. During that window, order-0 might have become unbalanced so it loops again for order-0 and returns that was reclaiming for order-0 to kswapd(). It can then find that a caller has rewoken kswapd for a high-order and re-enters balance_pgdat() without ever have called cond_resched().
Then, Shouldn't balance_pgdat() call cond_resched() unconditionally? The problem is NOT 100% cpu consumption. if kswapd will sleep, other processes need to reclaim old pages. The problem is, kswapd doesn't invoke context switch and other tasks hang-up.
While it appears unlikely, there are bad conditions which can result in cond_resched() being avoided.
-- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html