On Thu, Mar 28, 2024 at 04:39:01PM -0400, Stefan Hajnoczi wrote: > cp(1) and backup tools use llseek(SEEK_HOLE/SEEK_DATA) to skip holes in files. As a minor point of clarity (perhaps as much for my own records for documenting research I've done over the years, and not necessarily something you need to change in the commit message): Userspace apps generally use lseek(2) from glibc or similar (perhaps via its alias lseek64(), depending on whether userspace is using large file offsets), rather than directly calling the _llseek() syscall. But it all boils down to the same notion of seeking information about various special offsets. Also, in past history, coreutils cp(1) and dd(1) did experiment with using FS_IOC_FIEMAP ioctls when SEEK_HOLE was not available, but it proved to cause more problems than it solved, so that is not currently in favor. Yes, we could teach more and more block devices to expose specific ioctls for querying sparseness boundaries, and then teach userspace apps a list of ioctls to try; but as cp(1) already learned, having one common interface is much easier than an ever-growing ioctl ladder to be copied across every client that would benefit from knowing where the unallocated portions are. > This can speed up the process by reducing the amount of data read and it > preserves sparseness when writing to the output file. > > This patch series is an initial attempt at implementing > llseek(SEEK_HOLE/SEEK_DATA) for block devices. I'm looking for feedback on this > approach and suggestions for resolving the open issues. One of your open issues was whether adjusting the offset of the block device itself should also adjust the file offset of the underlying file (at least in the case of loopback and dm-linear). What would the community think of adding two additional constants to the entire family of *seek() functions, that have the effect of returning the same offset as their SEEK_HOLE/SEEK_DATA counterparts but without moving the file offset? Explaining the idea by example, although I'm not stuck on these names: suppose you have an fd visiting a file description of 2MiB in size, with the first 1MiB being a hole and the second being data. #define MiB (1024*1024) lseek64(fd, MiB, SEEK_SET); // returns MiB, file offset changed to MiB lseek64(fd, 0, SEEK_HOLE); // returns 0, file offset changed to 0 lseek64(fd, 0, SEEK_DATA); // returns MiB, file offset changed to MiB lseek64(fd, 0, SEEK_PEEK_HOLE); // returns 0, but file offset left at MiB lseek64(fd, 0, SEEK_SET); // returns 0, file offset changed to MiB lseek64(fd, 0, SEEK_PEEK_DATA); // returns MiB, but file offset left at MiB With semantics like that, it might be easier to implement just SEEK_PEEK* in devices (don't worry about changing offsets, just about reporting where the requested offset is), and then have a common layer do the translation from llseek(...,offs,SEEK_HOLE) into a 2-step llseek(...,llseek(...,offs,SEEK_PEEK_HOLE),SEEK_SET) if that makes life easier under the hood. > > In the block device world there are similar concepts to holes: > - SCSI has Logical Block Provisioning where the "mapped" state would be > considered data and other states would be considered holes. BIG caveat here: the SCSI spec does not necessarily guarantee that unmapped regions read as all zeroes; compare the difference between FALLOC_FL_ZERO_RANGE and FALLOC_FL_PUNCH_HOLE. While lseek(SEEK_HOLE) on a regular file guarantees that future read() in that hole will see NUL bytes, I'm not sure whether we want to make that guarantee for block devices. This may be yet another case where we might want to add new SEEK_* constants to the *seek() family of functions that lets the caller indicate whether they want offsets that are guaranteed to read as zero, vs. merely offsets that are not allocated but may or may not read as zero. Skipping unallocated portions, even when you don't know if the contents reliably read as zero, is still a useful goal in some userspace programs. > - NBD has NBD_CMD_BLOCK_STATUS for querying whether blocks are present. However, utilizing it in nbd.ko would require teaching the kernel to handle structured or extended headers (right now, that is an extension only supported in user-space implementations of the NBD protocol). I can see why you did not tackle that in this RFC series, even though you mention it in the cover letter. > - Linux loop block devices and dm-linear targets can pass through queries to > the backing file. > - dm-thin targets can query metadata to find holes. > - ...and you may be able to think of more examples. > > Therefore it is possible to offer this functionality in block drivers. > > In my use case a QEMU process in userspace copies the contents of a dm-thin > target. QEMU already uses SEEK_HOLE but that doesn't work on dm-thin targets > without this patch series. > > Others have also wished for block device support for SEEK_HOLE. Here is an open > issue from the BorgBackup project: > https://github.com/borgbackup/borg/issues/5609 > > With these patches userspace can identify holes in loop, dm-linear, and dm-thin > devices. This is done by adding a seek_hole_data() callback to struct > block_device_operations. When the callback is NULL the entire device is > considered data. Device-mapper is extended along the same lines so that targets > can provide seek_hole_data() callbacks. > > I'm unfamiliar with much of this code and have probably missed locking > requirements. Since llseek() executes synchronously like ioctl() and is not an > asynchronous I/O request it's possible that my changes to the loop block driver > and dm-thin are broken (e.g. what if the loop device fd is changed during > llseek()?). > > To run the tests: > > # make TARGETS=block_seek_hole -C tools/testing/selftests run_tests > > The code is also available here: > https://gitlab.com/stefanha/linux/-/tree/block-seek-hole > > Please take a look and let me know your thoughts. Thanks! > > Stefan Hajnoczi (9): > block: add llseek(SEEK_HOLE/SEEK_DATA) support > loop: add llseek(SEEK_HOLE/SEEK_DATA) support > selftests: block_seek_hole: add loop block driver tests > dm: add llseek(SEEK_HOLE/SEEK_DATA) support > selftests: block_seek_hole: add dm-zero test > dm-linear: add llseek(SEEK_HOLE/SEEK_DATA) support > selftests: block_seek_hole: add dm-linear test > dm thin: add llseek(SEEK_HOLE/SEEK_DATA) support > selftests: block_seek_hole: add dm-thin test > > tools/testing/selftests/Makefile | 1 + > .../selftests/block_seek_hole/Makefile | 17 +++ > include/linux/blkdev.h | 7 ++ > include/linux/device-mapper.h | 5 + > block/fops.c | 43 ++++++- > drivers/block/loop.c | 36 +++++- > drivers/md/dm-linear.c | 25 ++++ > drivers/md/dm-thin.c | 77 ++++++++++++ > drivers/md/dm.c | 68 ++++++++++ > .../testing/selftests/block_seek_hole/config | 3 + > .../selftests/block_seek_hole/dm_thin.sh | 80 ++++++++++++ > .../selftests/block_seek_hole/dm_zero.sh | 31 +++++ > .../selftests/block_seek_hole/map_holes.py | 37 ++++++ > .../testing/selftests/block_seek_hole/test.py | 117 ++++++++++++++++++ > 14 files changed, 540 insertions(+), 7 deletions(-) > create mode 100644 tools/testing/selftests/block_seek_hole/Makefile > create mode 100644 tools/testing/selftests/block_seek_hole/config > create mode 100755 tools/testing/selftests/block_seek_hole/dm_thin.sh > create mode 100755 tools/testing/selftests/block_seek_hole/dm_zero.sh > create mode 100755 tools/testing/selftests/block_seek_hole/map_holes.py > create mode 100755 tools/testing/selftests/block_seek_hole/test.py > > -- > 2.44.0 > -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org