On Sat, Jun 25, 2011 at 05:29:53PM -0700, Linda A. Walsh wrote: > I noticed in the 'cp' (coretuils 8.9-4.1) command on suse, there is > a a "--reflink" that controls "clone/CoW" copies -- which says > it performs a 'lightweight' copy where the data blocks are copied > only when modified. Now it is vague the 'when modified', (i.e. > does it mean ones that are different between the two copies) (src > and dst), or does it mean only to copy blocks that were modified > since 'some point' -- Doesn't say, but would guess it's src/dst diffs > (I wonder if it is restricted to the same physical filesystem). > > Anyway, turns out, it's only for BTRFS (which I haven't yet used, > and therefore know only that it supports operations like the above). Yup, it requires a refcounted, shared, copy-on-write extent index to do efficiently. > Would it be practice to implement, some similar, feature in XFS? It could be done, but it's a fairly large chunk of work.... > It wouldn't be practice or useful to do it on an 'extent' basis due > to their large size...So to do something similar on XFS, I was > thinking, with "some amount of effort", some number of "updated > extents" could be kept, in addition to the original data. Kept where, exactly? And how do you share the original extent tree between multiple inodes? And if all the inodes that share the original tree get truncated, how do you know that you can break the ref-link state and remove the original, now unreferenced tree? Once you have solved those problems, you have effectively designed a refcounted, shared, copy-on-write extent index.... > Any future modifications to the file would also have the extents > modified, but any extents that overlap previous mods will be merged, > and only the newest data would be kept (meaning that > new sections that are written, that skip over parts of the file, > wouldn't overwrite a pending change to that section -- only > the bytes (granularity?) that were changed. > > I.e. file is 1Mb. > User1 updates bytes 1k-200k. > User2, later updates bytes 100k-300k, New modification 'extent' is > created with 1k-300k, with bytes > 1k-(100k-1) from user1 be saved, and 100k-300k from user2. > > Changes to the 'base' copy would be made upon some ioctl 'sync' > command (file-by-file)... > > It would require up to double the amount of file space. For a single reflink copy, yes. But there's nothing stopping you from having multiple ref-link copies of the one file. And so the problem is far more complex than you are considering. I've looked at what it would require to implemnt reflinks transparently in XFS, and it's not pretty. Major surgery to the bmap code, a new btree type that includes back pointers to all the owner inodes, a new shadow inode type that holds the original tree, a new reflink inode type that contains the overwrite extent tree instead of a normal extent tree, a bunch of new transactions, new extent lookup/seek code, etc. I'd estimate it to be a 6 month project for someone who knew what they were doing. It's not just kernel code, but all the userspace tools need to be updated to understand reflinks and the COW based format (repair, check, db, bmap, etc) FWIW, I haven't even looked at how extended attributes are supposed to be handled on reflinked files, so that could increase the complexity significantly. > ---- > Another possibility would simply be to create a record of byte > ranges that have been updated in the extent and the extent's last > modification time. Then one could compare the mod times and apply > the changes. The problem there would be having to keep a > possibly 'large' log of changes (what if it's not sync/purged... > couldn't be circular as that would allow events to be lost -- though > the file system could be forced 'offline' if the event log became full > ...a major pain...)..., but if it was created with a few G of space, > might take a while...and if synced in time, no prob. > > Still, may be no great desire or benefit, but DAMN if I haven't > wanted copy-on-write files for a LONG time. So use a filesystem that supports them natively ;) > I.e. being able to hardlink files, but have an option to mark it as > copy on write -- allowing space to be save when copying directory trees, > but then dynamically making new copies when someone updates one of the > linked copies. The problem is that a reflink sort of looks like a hard link, but in many cases behaves like a soft link (e.g. different owners, permissions, etc are possible) and hence - combined with the copy-on-write behaviour - they need to be treated more like a soft-link in terms of implementation. Soft links have their own inode so can hold state separate to the inode they are pointing to, and for reflinked files it is simply not practical to retroactively modify the directory structure to point at a different inode when the first COW operation occurs. Like I said, it can be done, but it's not a small project. If you want to sink a significant amount of development time to the project, we will help you in any way we can. However, I don't think anyone has the time to do something like this for you.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs