On 01/23/2012 02:33 PM, Johannes Weiner wrote: > On Sun, Jan 22, 2012 at 09:27:14AM -0600, James Bottomley wrote: >> Since a lot of these are mm related; added linux-mm to cc list >> >> On Sun, 2012-01-22 at 15:50 +0200, Boaz Harrosh wrote: >>> [Targeted writeback (IO-less page-reclaim)] >>> Sometimes we would need to write a certain page or group of pages. It could be >>> nice to prioritize/start the writeback on these pages, through the regular writeback >>> mechanism instead of doing direct IO like today. >>> >>> This is actually related to above where we can have a "write_now" time constant that >>> makes the priority of that inode to be written first. Then we also need the page-info >>> that we want to write as part of that inode's IO. Usually today we start at the lowest >>> indexed page of the inode, right? In targeted writeback we should make sure the writeout >>> is the longest contiguous (aligned) dirty region containing the targeted page. >>> >>> With this in place we can also move to an IO-less page-reclaim. that is done entirely by >>> the BDI thread writeback. (Need I say more) >> >> All of the above are complex. The only reason for adding complexity in >> our writeback path should be because we can demonstrate that it's >> actually needed. In order to demonstrate this, you'd need performance >> measurements ... is there a plan to get these before the summit? > > The situations that required writeback for reclaim to make progress > have shrunk a lot with this merge window because of respecting page > reserves in the dirty limits, and per-zone dirty limits. > > What's left to evaluate are certain NUMA configurations where the > dirty pages are concentrated on a few nodes. Currently, we kick the > flushers from direct reclaim, completely undirected, just "clean some > pages, please". That works for systems up to a certain size, > depending on the size of the node in relationship to the system as a > whole (likelihood of pages cleaned being from the target node) and how > fast the backing storage is (impact of cleaning 'wrong' pages). > > So while the original problem is still standing, the urgency of it > might have been reduced quite a bit or the problem itself might have > been pushed into a corner where workarounds (spread dirty data more > evenly e.g.) might be more economical than trying to make writeback > node-aware and deal with all the implications (still have to guarantee > dirty cache expiration times for integrity; can fail spectacularly > when there is little or no relationship between disk placement and > memory placement, imagine round-robin allocation of disk-contiguous > dirty cache over a few nodes). > > I agree with James: find scenarios where workarounds are not feasible > but that are important enough that the complexity would be justified. > Otherwise, talking about how to fix them is moot. Fine so IO-less page-reclaim is moot. What do I know I've never seen a NUMA machine. But that was just a by product of half a section of a list of 8 sections. Are all these moot? I must be smoking something good ;-) Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html