On Mon, Apr 20, 2015 at 09:28:20PM -0700, Darrick J. Wong wrote: > On Mon, Apr 20, 2015 at 08:06:46PM -0500, xfs@xxxxxxxxxxx wrote: > > Hello list, > > > > I'm prototyping something like reflinks in xfs and was wondering if > > anyone could give me some pointers on the best way to duplicate the > > Heh, funny, I'm working on that too... > > > blocks of the shared inode at the reflink inode, the copy which must > > occur when breaking the link. > > ...though I'm not sure what "the shared inode at the reflink inode" means. > Are there somehow three inodes involved with reflinking one file to another? > > > It would be nice to do the transfer via the page cache after allocating > > the space at the desintation inode, but it doesn't seem like I can use > > any of the kernel helpers for copying the data via the address_space > > structs since I don't have a struct file on hand for the copy source. > > I'm doing this in xfs_file_open() so the only struct file I have is the > > file being opened for writing - the destination of the copy. > > So you're cloning the entire file's contents (i.e. breaking the reflink) as > soon as the file is opened rw? > > > What I do have on hand is the shared inode and the destination inode > > opened and ready to go, and the struct file for the destination. > > The design I'm pursuing is different from yours, I think -- two files can use > the regular bmbt to point to the same physical blocks, and there's a per-ag > btree that tracks reference counts for physical extents. What I'd like to do > for the CoW operation is to clone the page (somehow), change the bmbt mapping > to "delayed allocation", and let the dirty pages flush out like normal. > > I haven't figured out /how/ to do this, mind you. The rest of the bookkeeping > parts are already written, though. My first thought on COW was to try to use the write path get_blocks callback to do all this. i.e. in __xfs_get_blocks() detect that it is an overwrite of a shared extent, remove the shared extent reference and then convert it to delayed alloc extent. (i.e. xfs_iomap_overwrite_shared()). Then writeback will allocate new blocks for the data. The question, however, is how to do this in a manner such that crashing between the breaking of the shared reference and data writeback doesn't leave us with a hole instead of data. To deal with that, I think that we're going to have to break shared extents during writeback, not during the write. However, we are going to need a delalloc reservation to do that. So I suspect we need a new type of extent in the in-core extent tree - a "delalloc overwrite" extent - so that when we map it in writeback we can allocate the new extent, do the write to it, and then on IO completion do the BMBT manipulation to break the shared reference and insert the new extent. That solves the atomicity problem, and it allows us to track COW data on a per-inode basis without having to care about all the other reflink contexts to that same data. > With reflink enabled, xfsrepair theoretically can solve multiply claimed blocks > by simply adding the appropriate agblock:refcount entry to the refcount btree > and it's done. With rmap, XFS can solve multiply claimed blocks simply by looking at who really owns the block in the rmap... :P > > P.S. I've seen Dave Chinner's mention of reflink prototypes in XFS on > > lwn but haven't been able to find any code, what's the status of that? No code, because they are prototypes to determine if ideas are sane and workable. Similar to what Darrick is doing right now, and we've talked about it on #xfs a fair bit. Darrick has more time to work on this right now than I do, so he's the guy doing all the heavy lifting at the moment... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs