Re: [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support

Brian Foster <bfoster@xxxxxxxxxx> · Sun, 20 Dec 2015 09:02:54 -0500

On Sat, Dec 19, 2015 at 12:56:23AM -0800, Darrick J. Wong wrote:
> Hi all,
> 
...
> Fixed since RFCv3:
> 
>  * The reflink and dedupe ioctls are being hoisted to the VFS, as
>    provided in the first few patches.  Patch 81 connects to this
>    functionality.
> 
>  * Copy on write has been rewritten for v4.  We now use the existing
>    delayed allocation mechanism to coalesce writes together, deferring
>    allocation until writeout time.  This enables CoW to make better
>    block placement decisions and significantly reduces overhead.
>    CoW is still pretty slow, but not as slow as before.
> 
>  * Direct IO CoW has been implemented using the same mechanism as
>    above, but modified to perform the allocation and remapping right
>    then and there.  Throughput is much higher than pushing data
>    through the page cache CoW.  (It's the same mechanism, but we're
>    playing with chunks bigger than a single memory page.)
> 
>  * CoW ENOSPC works correctly now, except in the pathological case
>    that the AG fills up and the rmap btree cannot expand.  That will
>    be addressed for v5.
> 
>  * fallocate will now unshare blocks to prevent future ENOSPC, as
>    you'd expect.
> 
>  * refcount btree blocks are preallocated at mount time to prevent
>    ENOSPC while trying to expand the tree.  This also has the effect
>    of grouping the btree blocks together, which can speed up CoW
>    remapping.
> 

Can you elaborate on how these blocks are preallocated? E.g., is the
tree "preconstructed" in some sense? However that is done, is this the
anticipated solution or a temporary workaround..?

Also, shouldn't the enospc condition be handled by the agfl? I take it
there is something going on here that renders that solution flawed, so
I'm just curious what it is.

(Sorry if this is all explained elsewhere, but I haven't yet had a
chance to take a close enough look at this feature..).

Brian

> Issues: 
> 
>  * The extent swapping ioctl still allocates a bigger fixed-size
>    transaction.  That's most likely a stupid thing to do, so getting a
>    better grip on how the journalling code works and auditing all the
>    new transaction users will have to happen.  Right now it mostly
>    gets lucky.
> 
>  * EFI tracking for the allocated-but-not-yet-mapped blocks is
>    nonexistant.  A crash will leak them.
> 
>  * ENOSPC while expanding the rmap btree can crash the FS.  For now we
>    work around this problem by making the AGFL as big as possible,
>    failing CoW attempts with ENOSPC if there aren't enough AGFL blocks
>    available, and hoping that doesn't actually happen.
> 
> If you're going to start using this mess, you probably ought to just
> pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
> There are also updates for xfs-docs[4] and man-pages[5].
> 
> The patches have been xfstested with x64, i386, and ppc64; while in
> general the tests run to completion, there are still periodic bugs
> that will be addressed by the next RFC.  There's a persistent crash on
> arm64 and ppc64el that I haven't been able to triage.
> 
> This is an extraordinary way to eat your data.  Enjoy! 
> Comments and questions are, as always, welcome.
> 
> --D
> 
> [1] https://github.com/djwong/linux/tree/for-dave
> [2] https://github.com/djwong/xfsprogs/tree/for-dave
> [3] https://github.com/djwong/xfstests/tree/for-dave
> [4] https://github.com/djwong/xfs-documentation/tree/for-dave
> [5] https://github.com/djwong/man-pages/commits/for-mtk
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs