On Wed, Mar 20, 2024 at 08:25:31AM -0700, Darrick J. Wong wrote: > On Wed, Mar 20, 2024 at 10:36:42AM -0400, Josef Bacik wrote: > > Btrfs had a deadlock that you could trigger by mmap'ing a large file and > > using that as the buffer for fiemap. This test adds a c program to do > > this, and the fstest creates a large enough file and then runs the > > reproducer on the file. Without the fix btrfs deadlocks, with the fix > > we pass fine. > > > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > > --- > > v2->v3: > > - Add fiemap-fault to .gitignore > > - Added a _cleanup() helper > > - Just let the output of fiemap-fault go instead of using || _fail > > - Added the munmap > > - Moved $dst to $TEST_DIR/$seq > > > > .gitignore | 1 + > > src/Makefile | 2 +- > > src/fiemap-fault.c | 74 +++++++++++++++++++++++++++++++++++++++++++ > > tests/generic/808 | 48 ++++++++++++++++++++++++++++ > > tests/generic/808.out | 2 ++ > > 5 files changed, 126 insertions(+), 1 deletion(-) > > create mode 100644 src/fiemap-fault.c > > create mode 100755 tests/generic/808 > > create mode 100644 tests/generic/808.out > > > > diff --git a/.gitignore b/.gitignore > > index 3b160209..f0fb72bd 100644 > > --- a/.gitignore > > +++ b/.gitignore > > @@ -205,6 +205,7 @@ tags > > /src/vfs/mount-idmapped > > /src/log-writes/replay-log > > /src/perf/*.pyc > > +/src/filemap-fault > > > > # Symlinked files > > /tests/generic/035.out > > diff --git a/src/Makefile b/src/Makefile > > index e7442487..ab98a06f 100644 > > --- a/src/Makefile > > +++ b/src/Makefile > > @@ -34,7 +34,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \ > > attr_replace_test swapon mkswap t_attr_corruption t_open_tmpfiles \ > > fscrypt-crypt-util bulkstat_null_ocount splice-test chprojid_fail \ > > detached_mounts_propagation ext4_resize t_readdir_3 splice2pipe \ > > - uuid_ioctl t_snapshot_deleted_subvolume > > + uuid_ioctl t_snapshot_deleted_subvolume fiemap-fault > > > > EXTRA_EXECS = dmerror fill2attr fill2fs fill2fs_check scaleread.sh \ > > btrfs_crc32c_forged_name.py popdir.pl popattr.py \ > > diff --git a/src/fiemap-fault.c b/src/fiemap-fault.c > > new file mode 100644 > > index 00000000..73260068 > > --- /dev/null > > +++ b/src/fiemap-fault.c > > @@ -0,0 +1,74 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* > > + * Copyright (c) 2024 Meta Platforms, Inc. All Rights Reserved. > > + */ > > + > > +#include <sys/ioctl.h> > > +#include <sys/mman.h> > > +#include <sys/types.h> > > +#include <sys/stat.h> > > +#include <linux/fs.h> > > +#include <linux/types.h> > > +#include <linux/fiemap.h> > > +#include <err.h> > > +#include <errno.h> > > +#include <fcntl.h> > > +#include <stdio.h> > > +#include <string.h> > > +#include <unistd.h> > > + > > +int prep_mmap_buffer(int fd, void **addr) > > +{ > > + struct stat st; > > + int ret; > > + > > + ret = fstat(fd, &st); > > + if (ret) > > + err(1, "failed to stat %d", fd); > > + > > + *addr = mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); > > + if (*addr == MAP_FAILED) > > + err(1, "failed to mmap %d", fd); > > + > > + return st.st_size; > > +} > > + > > +int main(int argc, char *argv[]) > > +{ > > + struct fiemap *fiemap; > > + size_t sz, last = 0; > > + void *buf = NULL; > > + int ret, fd; > > + > > + if (argc != 2) > > + errx(1, "no in and out file name arguments given"); > > + > > + fd = open(argv[1], O_RDWR, 0666); > > + if (fd == -1) > > + err(1, "failed to open %s", argv[1]); > > + > > + sz = prep_mmap_buffer(fd, &buf); > > + > > + fiemap = (struct fiemap *)buf; > > + fiemap->fm_flags = 0; > > + fiemap->fm_extent_count = (sz - sizeof(struct fiemap)) / > > + sizeof(struct fiemap_extent); > > + > > + while (last < sz) { > > + int i; > > + > > + fiemap->fm_start = last; > > + fiemap->fm_length = sz - last; > > + > > + ret = ioctl(fd, FS_IOC_FIEMAP, (unsigned long)fiemap); > > + if (ret < 0) > > + err(1, "fiemap failed %d", errno); > > + for (i = 0; i < fiemap->fm_mapped_extents; i++) > > + last = fiemap->fm_extents[i].fe_logical + > > + fiemap->fm_extents[i].fe_length; > > + } > > + > > + munmap(buf, sz); > > + close(fd); > > + return 0; > > +} > > diff --git a/tests/generic/808 b/tests/generic/808 > > new file mode 100755 > > index 00000000..36015f35 > > --- /dev/null > > +++ b/tests/generic/808 > > @@ -0,0 +1,48 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2024 Meta Platforms, Inc. All Rights Reserved. > > +# > > +# FS QA Test 808 > > +# > > +# Test fiemap into an mmaped buffer of the same file > > +# > > +# Create a reasonably large file, then run a program which mmaps it and uses > > +# that as a buffer for an fiemap call. This is a regression test for btrfs > > +# where we used to hold a lock for the duration of the fiemap call which would > > +# result in a deadlock if we page faulted. > > +# > > +. ./common/preamble > > +_begin_fstest quick auto fiemap > > +[ $FSTYP == "btrfs" ] && \ > > + _fixed_by_kernel_commit b0ad381fa769 \ > > + "btrfs: fix deadlock with fiemap and extent locking" > > + > > +_cleanup() > > +{ > > + rm -f $dst > > + cd / > > + rm -r -f $tmp.* > > +} > > + > > +# real QA test starts here > > +_supported_fs generic > > +_require_test > > +_require_odirect > > +_require_test_program fiemap-fault > > +dst=$TEST_DIR/$seq/fiemap-fault > > + > > +mkdir -p $TEST_DIR/$seq > > + > > +echo "Silence is golden" > > + > > +for i in $(seq 0 2 1000) > > +do > > + $XFS_IO_PROG -d -f -c "pwrite -q $((i * 4096)) 4096" $dst > > +done > > I don't know if there's a specific reason that this does directio writes > at alternating offsets other than forcing allocations, but usually we do: > > $XFS_IO_PROG -f -c "pwrite -q 0 409600" $dst > $src/punch-alternating $dst > > to generate a file with a bunch of extent records. Also, since this is > a generic test that wants to create a file with sparse holes, it really > ought to be querying the file's allocation unit size: > > blksz=$(_get_file_block_size $TEST_DIR) > $XFS_IO_PROG -f -c "pwrite -q 0 $((blksz * 100))" $dst Ok I can do that instead, you're correct, all I want is a bunch of extents, and for btrfs at least doing alternating directio writes to get that. Thanks, Josef