Re: Issues with delalloc->real extent allocation

Geoffrey Wehrman <gwehrman@xxxxxxx> · Mon, 17 Jan 2011 08:37:08 -0600

On Mon, Jan 17, 2011 at 04:18:28PM +1100, Dave Chinner wrote:
| On Fri, Jan 14, 2011 at 10:16:29PM -0600, Geoffrey Wehrman wrote:
| > On Sat, Jan 15, 2011 at 09:59:07AM +1100, Dave Chinner wrote:
| > | On Fri, Jan 14, 2011 at 10:40:16AM -0600, Geoffrey Wehrman wrote:
| > | > Also, I'm not saying using XFS_BMAPI_EXACT is feasible.  I have a very
| > | > minimal understanding of the writepage code path.
| > | 
| > | I think there are situations where this does make sense, but given
| > | the potential issues I'm not sure it is a solution that can be
| > | extended to the general case. A good discussion point on a different
| > | angle, though. ;)
| > 
| > You've convinced me that XFS_BMAPI_EXACT is not the optimal solution.
| > 
| > Upon further consideration, I do like your proposal to make delalloc
| > allocation more like an intent/done type operation.  The compatibility
| > issues aren't all that bad.  As long as the filesystem is unmounted
| > clean, there is no need for the next mount do log recovery and therefore
| > no need to have any knowledge of the new transactions.
| 
| That is a good observation. If there is agreement that this a strong
| enough backwards compatibility guarantee (it's good enough for me),
| then I think that I will start to prototype this approach.

I'm not sure how a version of XFS without the new log recovery code will
behave if it encounters a log with the new transactions.  I assume it
will gracefully abort log recovery and fail the mount with the report of
a corrupt log.  I have no objection with this compatibility guarantee.

| However, this does not solve the extsize allocation issues where we
| don't have dirty pages in the page cache covering parts of the
| delayed allocation extent so we still need a solution for that. I'm
| tending towards zeroing in .aio_write as the simplest solution
| because it doesn't cause buffer head/extent tree mapping mismatches,
| and it would use the above intent/done operations for crash
| resilience so there's no additional, rarely used code path to test
| through .writepage. Does that sound reasonable?

Zeroing in .aio_write will create zeroed pages covering the entire
allocation, correct?  This seems like a reasonable and straightforward
approach.  I wish I had thought of it myself!

-- 
Geoffrey Wehrman  651-683-5496  gwehrman@xxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs