Re: [PATCH] generic: test eofblocks race with file extending aio dio writes

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Mon, Apr 24, 2017 at 01:32:42PM +0800, Zorro Lang wrote:
> On Thu, Apr 20, 2017 at 02:22:30PM -0400, Brian Foster wrote:
> > On Thu, Apr 20, 2017 at 11:21:03PM +0800, Zorro Lang wrote:
> > > It's possible for post-eof blocks to end up being used for direct I/O
> > > writes. dio write performs an upfront unwritten extent allocation, sends
> > > the dio and then updates the inode size (if necessary) on write
> > > completion. If a file release occurs while a file extending dio write is
> > > in flight, it is possible to mistake the post-eof blocks for speculative
> > > preallocation and incorrectly truncate them from the inode. This means
> > > that the resulting dio write completion can discover a hole and allocate
> > > new blocks rather than perform unwritten extent conversion.
> > > 
> > > A kernel warning can be reproduced by generic/299 on XFS:
> > >   XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, \
> > >        file: fs/xfs//xfs_trans.c, line: 309
> > > 
> > 
> > It might be important to point out that this test also exposes file
> > corruption and/or stale data exposure. E.g., the verification of written
> > data is an important distinction between this test and generic/299.
> 
> Sure, I'll add that. Hah, Sorry I pasted your commit log at here directly :)
> 
> > 
> > > The root cause is that xfs_free_eofblocks() uses i_size to truncate
> > > post-eof blocks from the inode, but async, file extending direct writes
> > > do not update i_size until write completion, long after inode locks are
> > > dropped. Therefore, xfs_free_eofblocks() effectively truncates the inode
> > > to the incorrect size.
> > > 
> > > For cover this filesystem corruption testing, write this new case to
> > > check data integrality manually, not only depend on a kernel warning.
> > > 
> > > Signed-off-by: Zorro Lang <zlang@xxxxxxxxxx>
> > > ---
> > > 
> > > Hi,
> > > 
> > > This case is similar with generic/114 which use same aio-dio-eof-race.c
> > > program. I was planning to write an aio writer, but I found there's
> > > already a good program written by Eric, so I turn to use it directly :)
> > > 
> > 
> > Well, that is convenient. ;) A few quick notes...
> 
> Yeah, but I've written one, then I found this one T_T..
> 
> > 
> > > This case is different with g/114, this case runs aio-dio-eof-race
> > > with an "open & close" process together. This multi-processes
> > > maybe trigger a free_eofblocks race with file extending dio write.
> > > 
> > > Thanks,
> > > Zorro
> > > 
> > > 
> > >  tests/generic/426     | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/generic/426.out |  2 ++
> > >  tests/generic/group   |  1 +
> > >  3 files changed, 90 insertions(+)
> > >  create mode 100755 tests/generic/426
> > >  create mode 100644 tests/generic/426.out
> > > 
> > > diff --git a/tests/generic/426 b/tests/generic/426
> > > new file mode 100755
> > > index 0000000..ff022ad
> > > --- /dev/null
> > > +++ b/tests/generic/426
> > > @@ -0,0 +1,87 @@
> > > +#! /bin/bash
> > > +# FS QA Test 426
> > > +#
> > > +# Try to trigger a race of free eofblocks and file extending dio writes.
> > > +# A known bug of XFS has been fixed by "e4229d6 xfs: fix eofblocks race
> > > +# with file extending async dio writes"
> > > +#
> > > +#-----------------------------------------------------------------------
> > > +# Copyright (c) 2017 Red Hat Inc.  All Rights Reserved.
> > > +#
> > > +# This program is free software; you can redistribute it and/or
> > > +# modify it under the terms of the GNU General Public License as
> > > +# published by the Free Software Foundation.
> > > +#
> > > +# This program is distributed in the hope that it would be useful,
> > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > +# GNU General Public License for more details.
> > > +#
> > > +# You should have received a copy of the GNU General Public License
> > > +# along with this program; if not, write the Free Software Foundation,
> > > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > > +#-----------------------------------------------------------------------
> > > +#
> > > +
> > > +seq=`basename $0`
> > > +seqres=$RESULT_DIR/$seq
> > > +echo "QA output created by $seq"
> > > +
> > > +here=`pwd`
> > > +tmp=/tmp/$$
> > > +status=1	# failure is the default!
> > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > +
> > > +_cleanup()
> > > +{
> > > +	cd /
> > > +	rm -f $tmp.*
> > > +}
> > > +
> > > +# get standard environment, filters and checks
> > > +. ./common/rc
> > > +. ./common/filter
> > > +
> > > +# remove previous $seqres.full before test
> > > +rm -f $seqres.full
> > > +
> > > +# real QA test starts here
> > > +
> > > +# Modify as appropriate.
> > > +_supported_fs generic
> > > +_supported_os Linux
> > > +_require_scratch
> > > +_require_aiodio aio-dio-eof-race
> > > +
> > 
> > It looks like this tool doesn't do a ton of writing... up to 8MB. Has
> > that proven to be a reliable reproducer in your testing so far? I'm kind
> > of wondering if we should update that tool with a flag to accept a
> > configurable file size, but that's not a big deal if the reproducer is
> > reliable.
> 
> Yes, compare with the original reproducer of this bug, aio-dio-eof-race
> do leass writing. For reproduce this bug, I increase the frequence of
> open & close (no sleep 1), then the bug still can be reproduced as my
> testing with very high reproducible.
> 
> I was thinking if I should change aio-dio-eof-race.c, because I don't know
> if it always can be 100% reproduced on high performance machine.
> 

Ok, well it may be worth a try on a different environment, though I
would expect it to still work.

> > 
> > > +# limit the filesystem size, to save the time of filling filesystem
> > > +_scratch_mkfs_sized $((1024 * 1024 * 1024)) >>$seqres.full 2>&1
> > > +_scratch_mount
> > > +
> > 
> > That's 1G... I wonder if we can get away with smaller (a couple hundred
> > MB perhaps?), since we have to fill the whole thing first.
> 
> Sure, how about 512MB or 256MB? What's the minimum hint size of XFS?
> 

IMO, the smallest you can possibly use, the better. I'm not sure we have
a specific min size, it may depend on what mkfs options are provided.
256MB sounds more reasonable to me...? This is something that is
probably going to run during every auto xfstests run, so the less time
spent doing unnecessary test preparation like stamping unused fs blocks
with a pattern, the easier it is to keep the overall runtime down.

Brian

> > 
> > > +# try to write more bytes than filesystem size to fill the filesystem
> > > +$XFS_IO_PROG -f -c "pwrite -S 0x55 0 $((1024 * 1024 * 1024 * 2))" \
> > > +	     $SCRATCH_MNT/fillfs-$seq 2>/dev/null
> > > +rm -f $SCRATCH_MNT/fillfs-$seq
> > > +
> > > +# start a background aio writer, which does several extending loops
> > > +# internally and check data integrality
> > > +$AIO_TEST $SCRATCH_MNT/tst-aio-dio-eof-race.$seq &
> > > +aio_pid=$!
> > > +
> > > +# open & close the file frequently, to trigger xfs_free_eofblocks
> > > +while true; do
> > > +	cat $SCRATCH_MNT/tst-aio-dio-eof-race.$seq >/dev/null 2>&1
> > > +done &
> > > +open_close_pid=$!
> > > +
> > 
> > We probably want to start the open/close loop before the writer to
> > reduce the chances of the writer doing much work before the open/close
> > loop actually starts.
> > 
> > I also wonder if something like the following would be better than using
> > cat:
> > 
> > while true; do
> > 	exec 3<>$SCRATCH_MNT/tst-aio-dio-eof-race.$seq
> > 	exec 3<>-
> > done
> > 
> > ... because we can skip useless reads of the file. Something like
> > '$XFS_IO -c open <file>' may be another option as well.
> 
> Sure, that's better !
> 
> > 
> > Brian
> > 
> > > +wait $aio_pid
> > > +status=$?
> > > +kill $open_close_pid
> > > +wait $open_close_pid
> > > +if [ $status -ne 0 ]; then
> > > +	od -t x1 $SCRATCH_MNT/tst-aio-dio-eof-race.$seq >> $seqres.full
> > > +	exit
> > > +fi
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/generic/426.out b/tests/generic/426.out
> > > new file mode 100644
> > > index 0000000..ad7a01a
> > > --- /dev/null
> > > +++ b/tests/generic/426.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 426
> > > +Success, all done.
> > > diff --git a/tests/generic/group b/tests/generic/group
> > > index 6d6e4f6..70c36f9 100644
> > > --- a/tests/generic/group
> > > +++ b/tests/generic/group
> > > @@ -428,3 +428,4 @@
> > >  423 auto quick
> > >  424 auto quick
> > >  425 auto quick attr
> > > +426 auto aio rw
> > > -- 
> > > 2.7.4
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux