On Thu, Jan 20, 2011 at 06:16:12AM -0500, Christoph Hellwig wrote: > On Thu, Jan 20, 2011 at 12:33:46PM +1100, Dave Chinner wrote: > > It's case b) that I'm mainly worried about, esp. w.r.t the 64k page > > size on ia64/ppc. If we only track a single dirty bit in the page, > > then every sub-page, non-appending write to an uncached region of a > > file becomes a RMW cycle to initialise the areas around the write > > correctly. The question is whether we care about this enough given > > that we return at least PAGE_SIZE in stat() to tell applications the > > optimal IO size to avoid RMW cycles. > > Note that this generally is only true for the first write into the > region - after that we'll have the rest read into the cache. But > we also have the same issue for appending writes if they aren't > page aligned. True - I kind of implied that by saying RMW cycles are limited to "uncached regions", but you've stated in a much clearer and easier to understand way. ;) > > And if we only do IO on whole pages (i.e regardless of block size) > > .writepage suddenly becomes a lot simpler, as well as being trivial > > to implement our own .readpage/.readpages.... > > I don't think it simplifies writepage a lot. All the buffer head > handling goes away, but we'll still need to do xfs_bmapi calls at > block size granularity. Why would you want to replaced the > readpage/readpages code? The generic mpage helpers for it do just fine. When I went through the mpage code I found there were cases that it would attached bufferheads to pages or assume PagePrivate() contains a bufferhead list. e.g. If there are multiple holes in the page, it will fall through to block_read_full_page() which makes this assumption. If we want/need to keep any of our own state on PagePrivate(), we cannot use any function that assumes PagePrivate() is used to hold bufferheads for the page. Quite frankly, a simple extent mapping loop like we do for .writepage is far simpler than what mpage_readpages does. This is what btrfs does (extent_readpages/__extent_read_full_page), and that is far easier to follow and understand than mpage_do_readpage().... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs