On Thu, Apr 20, 2017 at 02:22:30PM -0400, Brian Foster wrote: > On Thu, Apr 20, 2017 at 11:21:03PM +0800, Zorro Lang wrote: > > It's possible for post-eof blocks to end up being used for direct I/O > > writes. dio write performs an upfront unwritten extent allocation, sends > > the dio and then updates the inode size (if necessary) on write > > completion. If a file release occurs while a file extending dio write is > > in flight, it is possible to mistake the post-eof blocks for speculative > > preallocation and incorrectly truncate them from the inode. This means > > that the resulting dio write completion can discover a hole and allocate > > new blocks rather than perform unwritten extent conversion. > > > > A kernel warning can be reproduced by generic/299 on XFS: > > XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, \ > > file: fs/xfs//xfs_trans.c, line: 309 > > > > It might be important to point out that this test also exposes file > corruption and/or stale data exposure. E.g., the verification of written > data is an important distinction between this test and generic/299. Sure, I'll add that. Hah, Sorry I pasted your commit log at here directly :) > > > The root cause is that xfs_free_eofblocks() uses i_size to truncate > > post-eof blocks from the inode, but async, file extending direct writes > > do not update i_size until write completion, long after inode locks are > > dropped. Therefore, xfs_free_eofblocks() effectively truncates the inode > > to the incorrect size. > > > > For cover this filesystem corruption testing, write this new case to > > check data integrality manually, not only depend on a kernel warning. > > > > Signed-off-by: Zorro Lang <zlang@xxxxxxxxxx> > > --- > > > > Hi, > > > > This case is similar with generic/114 which use same aio-dio-eof-race.c > > program. I was planning to write an aio writer, but I found there's > > already a good program written by Eric, so I turn to use it directly :) > > > > Well, that is convenient. ;) A few quick notes... Yeah, but I've written one, then I found this one T_T.. > > > This case is different with g/114, this case runs aio-dio-eof-race > > with an "open & close" process together. This multi-processes > > maybe trigger a free_eofblocks race with file extending dio write. > > > > Thanks, > > Zorro > > > > > > tests/generic/426 | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++ > > tests/generic/426.out | 2 ++ > > tests/generic/group | 1 + > > 3 files changed, 90 insertions(+) > > create mode 100755 tests/generic/426 > > create mode 100644 tests/generic/426.out > > > > diff --git a/tests/generic/426 b/tests/generic/426 > > new file mode 100755 > > index 0000000..ff022ad > > --- /dev/null > > +++ b/tests/generic/426 > > @@ -0,0 +1,87 @@ > > +#! /bin/bash > > +# FS QA Test 426 > > +# > > +# Try to trigger a race of free eofblocks and file extending dio writes. > > +# A known bug of XFS has been fixed by "e4229d6 xfs: fix eofblocks race > > +# with file extending async dio writes" > > +# > > +#----------------------------------------------------------------------- > > +# Copyright (c) 2017 Red Hat Inc. All Rights Reserved. > > +# > > +# This program is free software; you can redistribute it and/or > > +# modify it under the terms of the GNU General Public License as > > +# published by the Free Software Foundation. > > +# > > +# This program is distributed in the hope that it would be useful, > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > +# GNU General Public License for more details. > > +# > > +# You should have received a copy of the GNU General Public License > > +# along with this program; if not, write the Free Software Foundation, > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > > +#----------------------------------------------------------------------- > > +# > > + > > +seq=`basename $0` > > +seqres=$RESULT_DIR/$seq > > +echo "QA output created by $seq" > > + > > +here=`pwd` > > +tmp=/tmp/$$ > > +status=1 # failure is the default! > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > + > > +_cleanup() > > +{ > > + cd / > > + rm -f $tmp.* > > +} > > + > > +# get standard environment, filters and checks > > +. ./common/rc > > +. ./common/filter > > + > > +# remove previous $seqres.full before test > > +rm -f $seqres.full > > + > > +# real QA test starts here > > + > > +# Modify as appropriate. > > +_supported_fs generic > > +_supported_os Linux > > +_require_scratch > > +_require_aiodio aio-dio-eof-race > > + > > It looks like this tool doesn't do a ton of writing... up to 8MB. Has > that proven to be a reliable reproducer in your testing so far? I'm kind > of wondering if we should update that tool with a flag to accept a > configurable file size, but that's not a big deal if the reproducer is > reliable. Yes, compare with the original reproducer of this bug, aio-dio-eof-race do leass writing. For reproduce this bug, I increase the frequence of open & close (no sleep 1), then the bug still can be reproduced as my testing with very high reproducible. I was thinking if I should change aio-dio-eof-race.c, because I don't know if it always can be 100% reproduced on high performance machine. > > > +# limit the filesystem size, to save the time of filling filesystem > > +_scratch_mkfs_sized $((1024 * 1024 * 1024)) >>$seqres.full 2>&1 > > +_scratch_mount > > + > > That's 1G... I wonder if we can get away with smaller (a couple hundred > MB perhaps?), since we have to fill the whole thing first. Sure, how about 512MB or 256MB? What's the minimum hint size of XFS? > > > +# try to write more bytes than filesystem size to fill the filesystem > > +$XFS_IO_PROG -f -c "pwrite -S 0x55 0 $((1024 * 1024 * 1024 * 2))" \ > > + $SCRATCH_MNT/fillfs-$seq 2>/dev/null > > +rm -f $SCRATCH_MNT/fillfs-$seq > > + > > +# start a background aio writer, which does several extending loops > > +# internally and check data integrality > > +$AIO_TEST $SCRATCH_MNT/tst-aio-dio-eof-race.$seq & > > +aio_pid=$! > > + > > +# open & close the file frequently, to trigger xfs_free_eofblocks > > +while true; do > > + cat $SCRATCH_MNT/tst-aio-dio-eof-race.$seq >/dev/null 2>&1 > > +done & > > +open_close_pid=$! > > + > > We probably want to start the open/close loop before the writer to > reduce the chances of the writer doing much work before the open/close > loop actually starts. > > I also wonder if something like the following would be better than using > cat: > > while true; do > exec 3<>$SCRATCH_MNT/tst-aio-dio-eof-race.$seq > exec 3<>- > done > > ... because we can skip useless reads of the file. Something like > '$XFS_IO -c open <file>' may be another option as well. Sure, that's better ! > > Brian > > > +wait $aio_pid > > +status=$? > > +kill $open_close_pid > > +wait $open_close_pid > > +if [ $status -ne 0 ]; then > > + od -t x1 $SCRATCH_MNT/tst-aio-dio-eof-race.$seq >> $seqres.full > > + exit > > +fi > > + > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/generic/426.out b/tests/generic/426.out > > new file mode 100644 > > index 0000000..ad7a01a > > --- /dev/null > > +++ b/tests/generic/426.out > > @@ -0,0 +1,2 @@ > > +QA output created by 426 > > +Success, all done. > > diff --git a/tests/generic/group b/tests/generic/group > > index 6d6e4f6..70c36f9 100644 > > --- a/tests/generic/group > > +++ b/tests/generic/group > > @@ -428,3 +428,4 @@ > > 423 auto quick > > 424 auto quick > > 425 auto quick attr > > +426 auto aio rw > > -- > > 2.7.4 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe fstests" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html