Re: [LSF/MM TOPIC] [ATTEND] Future writeback topics

Boaz Harrosh <bharrosh@xxxxxxxxxxx> · Mon, 23 Jan 2012 15:41:50 +0200

On 01/23/2012 02:33 PM, Johannes Weiner wrote:
> On Sun, Jan 22, 2012 at 09:27:14AM -0600, James Bottomley wrote:
>> Since a lot of these are mm related; added linux-mm to cc list
>>
>> On Sun, 2012-01-22 at 15:50 +0200, Boaz Harrosh wrote:
>>> [Targeted writeback (IO-less page-reclaim)]
>>>   Sometimes we would need to write a certain page or group of pages. It could be
>>>   nice to prioritize/start the writeback on these pages, through the regular writeback
>>>   mechanism instead of doing direct IO like today.
>>>
>>>   This is actually related to above where we can have a "write_now" time constant that
>>>   makes the priority of that inode to be written first. Then we also need the page-info
>>>   that we want to write as part of that inode's IO. Usually today we start at the lowest
>>>   indexed page of the inode, right? In targeted writeback we should make sure the writeout
>>>   is the longest contiguous (aligned) dirty region containing the targeted page.
>>>
>>>   With this in place we can also move to an IO-less page-reclaim. that is done entirely by
>>>   the BDI thread writeback. (Need I say more)
>>
>> All of the above are complex.  The only reason for adding complexity in
>> our writeback path should be because we can demonstrate that it's
>> actually needed.  In order to demonstrate this, you'd need performance
>> measurements ... is there a plan to get these before the summit?
> 
> The situations that required writeback for reclaim to make progress
> have shrunk a lot with this merge window because of respecting page
> reserves in the dirty limits, and per-zone dirty limits.
> 
> What's left to evaluate are certain NUMA configurations where the
> dirty pages are concentrated on a few nodes.  Currently, we kick the
> flushers from direct reclaim, completely undirected, just "clean some
> pages, please".  That works for systems up to a certain size,
> depending on the size of the node in relationship to the system as a
> whole (likelihood of pages cleaned being from the target node) and how
> fast the backing storage is (impact of cleaning 'wrong' pages).
> 
> So while the original problem is still standing, the urgency of it
> might have been reduced quite a bit or the problem itself might have
> been pushed into a corner where workarounds (spread dirty data more
> evenly e.g.) might be more economical than trying to make writeback
> node-aware and deal with all the implications (still have to guarantee
> dirty cache expiration times for integrity; can fail spectacularly
> when there is little or no relationship between disk placement and
> memory placement, imagine round-robin allocation of disk-contiguous
> dirty cache over a few nodes).
> 
> I agree with James: find scenarios where workarounds are not feasible
> but that are important enough that the complexity would be justified.
> Otherwise, talking about how to fix them is moot.

Fine so IO-less page-reclaim is moot. What do I know I've never seen
a NUMA machine. But that was just a by product of half a section
of a list of 8 sections. Are all these moot? I must be smoking something
good ;-)

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html