On Wed, Dec 19, 2012 at 08:30:58PM -0500, Joe Landman wrote: > On 12/19/2012 05:54 PM, Dave Chinner wrote: > >On Wed, Dec 19, 2012 at 05:17:45PM -0500, Joe Landman wrote: > > [...] > > >> Pointers appreciated. I am looking at the copy routines in > >>coreutils now, looking to see if we can increase its intelligence > >>somewhat w.r.t. sparse files. > > > >Here's a good overview of the state of play: > > > >http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/sparse-improvements-LPC-2012.pdf > > > > I git cloned the coreutils to look at current state of the code, and > saw exactly what is represented on slide 14. > > "Brute force – read each sector in full, before > skipping while writing the copy" http://code.metager.de/source/xref/coreutils/src/extent-scan.c It uses FIEMAP. Indeed, take a 1TB sparse file (empty) and copy it: $ cp --version cp (GNU coreutils) 8.13 $ ls -lh blah -rw-r--r-- 1 root root 1.0T Nov 30 06:42 blah $ xfs_bmap -v blah blah: no extents $ strace cp --sparse=always blah fred ..... stat("fred", 0x7fff98d7ded0) = -1 ENOENT (No such file or directory) stat("blah", {st_mode=S_IFREG|0600, st_size=1099511627776, ...}) = 0 stat("fred", 0x7fff98d7dc50) = -1 ENOENT (No such file or directory) open("blah", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0600, st_size=1099511627776, ...}) = 0 open("fred", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4 fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 ioctl(3, FS_IOC_FIEMAP, 0x7fff98d7c9d0) = 0 ftruncate(4, 1099511627776) = 0 close(4) = 0 close(3) = 0 ... $ xfs_bmap -v fred fred: no extents $ Looks like cp already does what you want - it didn't copy a TB Of zeros..... ;) > >And what you really want is a version of cp that supports these: > > > >$ man lseek > >.... > > Seeking file data and holes > > Since version 3.1, Linux supports the following additional > > values for whence: > > > > SEEK_DATA > > Adjust the file offset to the next location in the > > file greater than or equal to offset containing data. > > If offset points to data, then the file offset is set > > to offset. > > > > SEEK_HOLE > > Adjust the file offset to the next hole in the file > > greater than or equal to offset. If offset points > > into the middle of a hole, then the file offset is > > set to offset. If there is no hole past offset, then > > the file offset is adjusted to the end of the file > > (i.e., there is an implicit hole at the end of any > > file). > >..... > > Something akin to this. Actually would like to be able to have it > pull bmap data so that reading over a file only reads populated > extents, so that anything that is not populated is nulled out by > definition. I am guessing that these are the abstraction above bmap > type data? Except that bmap/fiemap data is not suficient to correct determine whether extents have data over them or not. You have to sync the file first, and even then you have to treat unwritten extents as data as you can't avoid races with overwrites putting data into the extents while the copy is in progress. Seriously, SEEK_HOLE/SEEK_DATA is what you want because it has none of these issues. If you use FIEMAP, you've got to understand a lot about how filesystems work to use it so you don't miss real data, and different filesystems have subtly different semantics that make this very difficult indeed. > Was thinking of hacking something up at a much higher level > (cheating by parsing bmap data and stuff like that). Don't. You'll only get it wrong, just like the initial attempts to use FIEMAP in cp did. There's a reason coreutils is moving to SEEK_HOLE/SEEK_DATA instead of FIEMAP for efficient sparse file handling.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs