On 2/13/20 5:48 PM, Dave Chinner wrote: > On Thu, Feb 13, 2020 at 03:12:24PM -0600, Eric Sandeen wrote: >> I had a request from someone who cared about mkfs speed(!) >> over a slower network block device to look into using faster >> zeroing methods, particularly for the log, during mkfs.xfs. >> >> e2fsprogs already does this, thanks to some guy named Darrick: >> >> /* >> * If we know about ZERO_RANGE, try that before we try PUNCH_HOLE because >> * ZERO_RANGE doesn't unmap preallocated blocks. We prefer fallocate because >> * it always invalidates page cache, and libext2fs requires that reads after >> * ZERO_RANGE return zeroes. >> */ >> static int __unix_zeroout(int fd, off_t offset, off_t len) >> { >> int ret = -1; >> >> #if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_ZERO_RANGE) >> ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, offset, len); >> if (ret == 0) >> return 0; >> #endif >> #if defined(HAVE_FALLOCATE) && defined(FALLOC_FL_PUNCH_HOLE) && defined(FALLOC_FL_KEEP_SIZE) >> ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, >> offset, len); >> if (ret == 0) >> return 0; >> #endif >> errno = EOPNOTSUPP; >> return ret; >> } >> >> and nobody has exploded so far, AFAIK. :) So, floating this idea >> for xfsprogs. I'm a little scared of the second #ifdef block above, but >> if that's really ok/consistent/safe we could add it too. > > If FALLOC_FL_PUNCH_HOLE is defined, then FALLOC_FL_KEEP_SIZE is > guaranteed to be defined, so that condition check is somewhat > redundant. See commit 79124f18b335 ("fs: add hole punching to > fallocate").... Style aside, is that 2nd zeroing method something we want to try? >> The patch moves some defines around too, I could split that up and resend >> if this isn't laughed out of the room. >> >> Thanks, >> -Eric >> >> ===== >> >> libxfs: use FALLOC_FL_ZERO_RANGE in libxfs_device_zero >> >> I had a request from someone who cared about mkfs speed(!) >> over a slower network block device to look into using faster >> zeroing methods, particularly for the log, during mkfs. >> >> Using FALLOC_FL_ZERO_RANGE is faster in this case than writing >> a bunch of zeros across a wire. >> >> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx> >> --- >> >> diff --git a/include/linux.h b/include/linux.h >> index 8f3c32b0..425badb5 100644 >> --- a/include/linux.h >> +++ b/include/linux.h >> @@ -113,6 +113,26 @@ static __inline__ void platform_uuid_copy(uuid_t *dst, uuid_t *src) >> uuid_copy(*dst, *src); >> } >> >> +#ifndef FALLOC_FL_PUNCH_HOLE >> +#define FALLOC_FL_PUNCH_HOLE 0x02 >> +#endif >> + >> +#ifndef FALLOC_FL_COLLAPSE_RANGE >> +#define FALLOC_FL_COLLAPSE_RANGE 0x08 >> +#endif >> + >> +#ifndef FALLOC_FL_ZERO_RANGE >> +#define FALLOC_FL_ZERO_RANGE 0x10 >> +#endif >> + >> +#ifndef FALLOC_FL_INSERT_RANGE >> +#define FALLOC_FL_INSERT_RANGE 0x20 >> +#endif >> + >> +#ifndef FALLOC_FL_UNSHARE_RANGE >> +#define FALLOC_FL_UNSHARE_RANGE 0x40 >> +#endif > > These were added to allow xfs_io to test these operations before > there was userspace support for them. I do not think we should > propagate them outside of xfs_io - if they are not supported by the > distro at build time, we shouldn't attempt to use them in mkfs. > > i.e. xfs_io is test-enablement code for developers, mkfs.xfs is > production code for users, so different rules kinda exist for > them... Fair enough. people /could/ use newer xfsprogs on older kernels, but... those defines are getting pretty old by now in any case. It's probably not terrible to just fail the build on a system that doesn't have falloc.h, by now? >> diff --git a/libxfs/Makefile b/libxfs/Makefile >> index fbcc963a..b4e8864b 100644 >> --- a/libxfs/Makefile >> +++ b/libxfs/Makefile >> @@ -105,6 +105,10 @@ CFILES = cache.c \ >> # >> #LCFLAGS += >> >> +ifeq ($(HAVE_FALLOCATE),yes) >> +LCFLAGS += -DHAVE_FALLOCATE >> +endif > > HAVE_FALLOCATE comes from an autoconf test. I suspect that this > needs to be made more finegrained, testing for fallocate() features, > not just just whether the syscall exists. And the above should > probably be in include/builddefs.in so that it's available to all > of xfsprogs, not just libxfs code... But that's testing the kernel on the build host, right? What's the point of that? Or am I misunderstanding? >> + >> FCFLAGS = -I. >> >> LTLIBS = $(LIBPTHREAD) $(LIBRT) >> diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c >> index 0d9d7202..94f63bbf 100644 >> --- a/libxfs/rdwr.c >> +++ b/libxfs/rdwr.c >> @@ -4,6 +4,9 @@ >> * All Rights Reserved. >> */ >> >> +#if defined(HAVE_FALLOCATE) >> +#include <linux/falloc.h> >> +#endif > > Urk. That should be in include/linux.h, right? > >> >> #include "libxfs_priv.h" >> #include "init.h" >> @@ -60,9 +63,21 @@ int >> libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len) >> { >> xfs_off_t start_offset, end_offset, offset; >> - ssize_t zsize, bytes; >> + ssize_t zsize, bytes, len_bytes; >> char *z; >> - int fd; >> + int ret, fd; >> + >> + fd = libxfs_device_to_fd(btp->dev); >> + start_offset = LIBXFS_BBTOOFF64(start); >> + end_offset = LIBXFS_BBTOOFF64(start + len) - start_offset; >> + >> +#if defined(HAVE_FALLOCATE) >> + /* try to use special zeroing methods, fall back to writes if needed */ >> + len_bytes = LIBXFS_BBTOOFF64(len); >> + ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, start_offset, len_bytes); >> + if (ret == 0) >> + return 0; >> +#endif > > I kinda dislike the "return if success" hidden inside the ifdef - > it's not a code pattern I'd expect to see. This is what I'd tend > to expect in include/linux.h: > > #if defined(HAVE_FALLOCATE_ZERO_RANGE) > static inline int > platform_zero_range( > int fd, > xfs_off_t start_offset, > xfs_off_t end_offset) > { > int ret; > > ret = fallocate(fd, FALLOC_FL_ZERO_RANGE, start_offset, len_bytes); > if (!ret) > return 0; > return -errno; > } > #else > #define platform_zero_range(fd, s, o) (true) > #endif > > and then the code in libxfs_device_zero() does: > > error = platform_zero_range(fd, start_offset, len_bytes); > if (!error) > return 0; > > without adding nasty #defines... Yeah that's much better. Tho I hate adding more platform_* stuff when really we're down to only one platform but maybe that's a project for another day. Thanks for the review, -Eric