On Wed, Sep 2, 2020 at 4:03 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > We had a problem recently where btrfs would deadlock with > O_DIRECT|O_DSYNC because of an unexpected dependency on ->fsync in > iomap. This was only caught by chance with aiostress, because weirdly > we don't actually test this particular configuration anywhere in > xfstests. Fix this by adding a basic test that just does > O_DIRECT|O_DSYNC writes. With this test the box deadlocks right away > with Btrfs, which would have been helpful in finding this issue before > the patches were merged. > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > --- > .gitignore | 1 + > src/aio-dio-regress/dio-dsync.c | 61 +++++++++++++++++++++++++++++++++ > tests/generic/609 | 44 ++++++++++++++++++++++++ > tests/generic/group | 1 + > 4 files changed, 107 insertions(+) > create mode 100644 src/aio-dio-regress/dio-dsync.c > create mode 100755 tests/generic/609 > > diff --git a/.gitignore b/.gitignore > index 5f5c4a0f..07c8014b 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -175,6 +175,7 @@ > /src/aio-dio-regress/aio-last-ref-held-by-io > /src/aio-dio-regress/aiocp > /src/aio-dio-regress/aiodio_sparse2 > +/src/aio-dio-regress/dio-dsync > /src/log-writes/replay-log > /src/perf/*.pyc > > diff --git a/src/aio-dio-regress/dio-dsync.c b/src/aio-dio-regress/dio-dsync.c > new file mode 100644 > index 00000000..53cda9ac > --- /dev/null > +++ b/src/aio-dio-regress/dio-dsync.c > @@ -0,0 +1,61 @@ > +// SPDX-License-Identifier: GPL-2.0-or-newer > +/* > + * Copyright (c) 2020 Facebook. > + * All Rights Reserved. > + * > + * Do some O_DIRECT writes with O_DSYNC to exercise this path. > + */ > +#include <stdio.h> > +#include <sys/types.h> > +#include <sys/stat.h> > +#include <fcntl.h> > +#include <unistd.h> > +#include <stdlib.h> > +#include <errno.h> > +#include <string.h> > + > +int main(int argc, char **argv) > +{ > + struct stat st; > + char *buf; > + ssize_t ret; > + int fd, i; > + int bufsize; > + > + if (argc != 2) { > + printf("Usage: %s filename\n", argv[0]); > + return 1; > + } > + > + fd = open(argv[1], O_DIRECT | O_RDWR | O_TRUNC | O_CREAT | O_DSYNC, > + 0644); > + if (fd < 0) { > + perror(argv[1]); > + return 1; > + } > + > + if (fstat(fd, &st)) { > + perror(argv[1]); > + return 1; > + } > + bufsize = st.st_blksize * 10; > + > + ret = posix_memalign((void **)&buf, st.st_blksize, bufsize); > + if (ret) { > + errno = ret; > + perror("allocating buffer"); > + return 1; > + } > + > + memset(buf, 'a', bufsize); > + for (i = 0; i < 10; i++) { > + ret = write(fd, buf, bufsize); > + if (ret < 0) { > + perror("writing"); > + return 1; > + } > + } > + free(buf); > + close(fd); > + return 0; > +} > diff --git a/tests/generic/609 b/tests/generic/609 > new file mode 100755 > index 00000000..8a888eb9 > --- /dev/null > +++ b/tests/generic/609 > @@ -0,0 +1,44 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2020 Josef Bacik. All Rights Reserved. > +# > +# FS QA Test 609 > +# > +# iomap can call generic_write_sync() if we're O_DSYNC, so write a basic test to > +# exercise O_DSYNC so any unsuspecting file systems will get lockdep warnings if > +# they're locking isn't compatible. > +# > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > + rm -rf $TEST_DIR/file > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# Modify as appropriate. > +_supported_fs generic > +_supported_os Linux > +_require_test > +_require_aiodio dio-dsync > + > +$AIO_TEST $TEST_DIR/file This can be triggered with xfs_io and without adding a new test program: #!/bin/bash mkfs.btrfs -f /dev/sdj mount /dev/sdj /mnt/sdj xfs_io -f -d -c "pwrite -D -V 1 0 4K" /mnt/sdj/foobar umount /mnt/sdj It triggers the bug right away: [22674.919276] BTRFS: device fsid 90a97705-b2ca-46a9-a686-87a131f91786 devid 1 transid 5 /dev/sdj scanned by mkfs.btrfs (1325449) [22674.960097] BTRFS info (device sdj): disk space caching is enabled [22674.960101] BTRFS info (device sdj): has skinny extents [22674.960103] BTRFS info (device sdj): flagging fs with big metadata feature [22674.965914] BTRFS info (device sdj): checking UUID tree [22675.034231] ============================================ [22675.034835] WARNING: possible recursive locking detected [22675.035444] 5.9.0-rc3-btrfs-next-67 #1 Not tainted [22675.036049] -------------------------------------------- [22675.036657] xfs_io/1325480 is trying to acquire lock: [22675.037273] ffff9cd0b4aea658 (&sb->s_type->i_mutex_key#15){++++}-{3:3}, at: btrfs_sync_file+0x179/0x600 [btrfs] [22675.037936] but task is already holding lock: [22675.039143] ffff9cd0b4aea658 (&sb->s_type->i_mutex_key#15){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x5f0 [btrfs] [22675.039791] other info that might help us debug this: [22675.041007] Possible unsafe locking scenario: [22675.042234] CPU0 [22675.042838] ---- [22675.043433] lock(&sb->s_type->i_mutex_key#15); [22675.044025] lock(&sb->s_type->i_mutex_key#15); [22675.044624] *** DEADLOCK *** [22675.046238] May be due to missing lock nesting notation [22675.047326] 3 locks held by xfs_io/1325480: [22675.047855] #0: ffff9cd0ad8ac470 (sb_writers#14){.+.+}-{0:0}, at: vfs_writev+0xd8/0xf0 [22675.048439] #1: ffff9cd0b4aea658 (&sb->s_type->i_mutex_key#15){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x5f0 [btrfs] [22675.049038] #2: ffff9cd0b4aea4c8 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs] [22675.049633] stack backtrace: [22675.050727] CPU: 0 PID: 1325480 Comm: xfs_io Not tainted 5.9.0-rc3-btrfs-next-67 #1 [22675.051275] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [22675.052399] Call Trace: [22675.052976] dump_stack+0x8d/0xc0 [22675.053559] __lock_acquire.cold+0x210/0x2e9 [22675.054145] ? rcu_read_lock_sched_held+0x5d/0x90 [22675.054777] lock_acquire+0xb1/0x470 [22675.055352] ? btrfs_sync_file+0x179/0x600 [btrfs] [22675.055923] down_write+0x40/0x130 [22675.056496] ? btrfs_sync_file+0x179/0x600 [btrfs] [22675.057044] btrfs_sync_file+0x179/0x600 [btrfs] [22675.057593] ? __do_sys_kcmp+0x980/0xce0 [22675.058143] iomap_dio_complete+0x112/0x130 [22675.058679] ? iomap_dio_rw+0x2c4/0x620 [22675.059199] iomap_dio_rw+0x494/0x620 [22675.059723] ? btrfs_direct_IO+0xd5/0x160 [btrfs] [22675.060238] btrfs_direct_IO+0xd5/0x160 [btrfs] [22675.060738] btrfs_file_write_iter+0x215/0x5f0 [btrfs] [22675.061221] do_iter_readv_writev+0x169/0x1f0 [22675.061698] do_iter_write+0x80/0x1b0 [22675.062170] vfs_writev+0xa6/0xf0 [22675.062643] ? syscall_enter_from_user_mode+0xb4/0x270 [22675.063114] ? find_held_lock+0x32/0x90 [22675.063581] ? syscall_enter_from_user_mode+0xb4/0x270 [22675.064045] do_pwritev+0x8f/0xd0 [22675.064489] do_syscall_64+0x33/0x80 [22675.064934] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [22675.065372] RIP: 0033:0x7fac51c898b9 [22675.065792] Code: 89 fd 53 44 89 c3 48 83 ec 18 64 8b 04 25 18 00 00 00 85 c0 0f 85 97 00 00 00 45 89 c1 49 89 ca 45 31 c0 b8 48 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 cb 00 00 00 48 85 c0 79 44 4c 8b 2d 9f 75 [22675.066692] RSP: 002b:00007ffdae24ecd0 EFLAGS: 00000246 ORIG_RAX: 0000000000000148 [22675.067155] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fac51c898b9 [22675.067628] RDX: 0000000000000001 RSI: 000055990bc5bf90 RDI: 0000000000000003 [22675.068113] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000002 [22675.068592] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [22675.069074] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000002 Or is there anything I missed? Thanks. > + > +echo "Silence is golden" > +status=0 > +exit > diff --git a/tests/generic/group b/tests/generic/group > index aa969bcb..ae2567a0 100644 > --- a/tests/generic/group > +++ b/tests/generic/group > @@ -611,3 +611,4 @@ > 606 auto attr quick dax > 607 auto attr quick dax > 608 auto attr quick dax > +609 auto quick > -- > 2.26.2 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”