Re: Issues with delalloc->real extent allocation

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 21 Jan 2011 12:59:40 +1100

On Thu, Jan 20, 2011 at 06:16:12AM -0500, Christoph Hellwig wrote:
> On Thu, Jan 20, 2011 at 12:33:46PM +1100, Dave Chinner wrote:
> > It's case b) that I'm mainly worried about, esp. w.r.t the 64k page
> > size on ia64/ppc. If we only track a single dirty bit in the page,
> > then every sub-page, non-appending write to an uncached region of a
> > file becomes a RMW cycle to initialise the areas around the write
> > correctly. The question is whether we care about this enough given
> > that we return at least PAGE_SIZE in stat() to tell applications the
> > optimal IO size to avoid RMW cycles.
> 
> Note that this generally is only true for the first write into the
> region - after that we'll have the rest read into the cache.  But
> we also have the same issue for appending writes if they aren't
> page aligned.

True - I kind of implied that by saying RMW cycles are limited to
"uncached regions", but you've stated in a much clearer and easier
to understand way. ;)

> > And if we only do IO on whole pages (i.e regardless of block size)
> > .writepage suddenly becomes a lot simpler, as well as being trivial
> > to implement our own .readpage/.readpages....
> 
> I don't think it simplifies writepage a lot.  All the buffer head
> handling goes away, but we'll still need to do xfs_bmapi calls at
> block size granularity.  Why would you want to replaced the
> readpage/readpages code?  The generic mpage helpers for it do just fine.

When I went through the mpage code I found there were cases that it
would attached bufferheads to pages or assume PagePrivate() contains
a bufferhead list. e.g. If there are multiple holes in the page, it
will fall through to block_read_full_page() which makes this
assumption.  If we want/need to keep any of our own state on
PagePrivate(), we cannot use any function that assumes PagePrivate()
is used to hold bufferheads for the page.

Quite frankly, a simple extent mapping loop like we do for
.writepage is far simpler than what mpage_readpages does. This is
what btrfs does (extent_readpages/__extent_read_full_page), and that
is far easier to follow and understand than mpage_do_readpage()....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs