On Thu, Jan 26, 2017 at 10:05:09AM +0000, Mel Gorman wrote: > On Mon, Jan 23, 2017 at 01:16:39PM -0500, Johannes Weiner wrote: > > Direct reclaim has been replaced by kswapd reclaim in pretty much all > > common memory pressure situations, so this code most likely doesn't > > accomplish the described effect anymore. The previous patch wakes up > > flushers for all reclaimers when we encounter dirty pages at the tail > > end of the LRU. Remove the crufty old direct reclaim invocation. > > > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > In general I like this. I worried first that if kswapd is blocked > writing pages that it won't reach the wakeup_flusher_threads but the > previous patch handles it. > > Now though, it occurs to me with the last patch that we always writeout > the world when flushing threads. This may not be a great idea. Consider > for example if there is a heavy writer of short-lived tmp files. In such a > case, it is possible for the files to be truncated before they even hit the > disk. However, if there are multiple "writeout the world" calls, these may > now be hitting the disk. Furthermore, multiplle kswapd and direct reclaimers > could all be requested to writeout the world and each request unplugs. > > Is it possible to maintain the property of writing back pages relative > to the numbers of pages scanned or have you determined already that it's > not necessary? That's what I started out with - waking the flushers for nr_taken. I was using a silly test case that wrote < dirty background limit and then allocated a burst of anon memory. When the dirty data is linear, the bigger IO requests are beneficial. They don't exhaust struct request (like kswapd 4k IO routinely does, and SWAP_CLUSTER_MAX is only 32), and they require less frequent plugging. Force-flushing temporary files under memory pressure is a concern - although the most recently dirtied files would get queued last, giving them still some time to get truncated - but I'm wary about splitting the flush requests too aggressively when we DO sustain throngs of dirty pages hitting the reclaim scanners. I didn't test this with the real workload that gave us problems yet, though, because deploying enough machines to get a good sample size takes 1-2 days and to run through the full load spectrum another 4-5. So it's harder to fine-tune these patches. But this is a legit concern. I'll try to find out what happens when we reduce the wakeups to nr_taken. Given the problem these patches address, though, would you be okay with keeping this patch in -mm? We're too far into 4.10 to merge it upstream now, and I should have data on more precise wakeups before the next merge window. Thanks -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>