wait_iff_congested() only waits if ZONE_CONGESTED is set (and at least one BDI is still congested). Modulo concurrent changes to BDI congestion status: After this change, the probability that a given shrink_inactive_list() sets ZONE_CONGESTED increases monotonically with the fraction of dirty pages on the LRU, to 100% if all dirty pages are backed by a write-congested BDI. This is in line with what appears to intended, judging by the comment: /* * Tag a zone as congested if all the dirty pages scanned were * backed by a congested BDI and wait_iff_congested will stall. */ if (nr_dirty && nr_dirty == nr_congested) set_bit(ZONE_CONGESTED, &zone->flags); Before this change, the probability that a given shrink_inactive_list() sets ZONE_CONGESTED varies erratically. Because the ZONE_CONGESTED condition is nr_dirty && nr_dirty == nr_congested, the probability peaks when the fraction of dirty pages is equal to the fraction of file pages backed by congested BDIs. So under some circumstances, an increase in the fraction of dirty pages or in the fraction of congested pages can actually result in an *decreased* probability that reclaim will stall for writeback congestion, and vice versa; which is both counterintuitive and counterproductive. On Wed, Oct 15, 2014 at 1:05 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Wed, 15 Oct 2014 12:58:35 -0700 Jamie Liu <jamieliu@xxxxxxxxxx> wrote: > >> shrink_page_list() counts all pages with a mapping, including clean >> pages, toward nr_congested if they're on a write-congested BDI. >> shrink_inactive_list() then sets ZONE_CONGESTED if nr_dirty == >> nr_congested. Fix this apples-to-oranges comparison by only counting >> pages for nr_congested if they count for nr_dirty. >> >> ... >> >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -875,7 +875,8 @@ static unsigned long shrink_page_list(struct list_head *page_list, >> * end of the LRU a second time. >> */ >> mapping = page_mapping(page); >> - if ((mapping && bdi_write_congested(mapping->backing_dev_info)) || >> + if (((dirty || writeback) && mapping && >> + bdi_write_congested(mapping->backing_dev_info)) || >> (writeback && PageReclaim(page))) >> nr_congested++; > > What are the observed runtime effects of this change? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>