Re: [PATCH] test race when checking i_size on direct i/o read

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Tue, Sep 19, 2017 at 03:36:16PM +0800, Eryu Guan wrote:
> On Fri, Aug 18, 2017 at 03:35:02PM -0500, Eric Sandeen wrote:
> > From: Zheng Liu <wenqing.lz@xxxxxxxxxx>
> > 
> > In this commit a new test case is added to test that i_size races don't
> > occur under dio reads/writes.  We add a program in /src dir, which
> > has a writer to issue some append dio writes.  Meanwhile it has a reader
> > in this test to do some dio reads.  As we expect, reader should read
> > nothing or data with 'a'.  But it might read some data with '0'.
> > 
> > The bug can be reproduced by this test case [1].
> > 
> > 1.  http://patchwork.ozlabs.org/patch/311761/
> > 
> > This ostensibly tests commit:
> > 9fe55eea7 Fix race when checking i_size on direct i/o read
> > 
> > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> > Cc: Rich Johnston <rjohnston@xxxxxxx>
> > Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> > Signed-off-by: Zheng Liu <wenqing.lz@xxxxxxxxxx>
> > [sandeen: update to recent xfstests, update commitlog]
> > Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> > ---
> > 
> > This test was originally titled:
> > 
> >  xfstests: add a new test case to test i_size updated properly under dio
> > 
> > but I think the issue is more of when and how it's tested, not how
> > it's updated.
> > 
> > Note, this passes on xfs on 4.10, but fails on 4.12.
> > ext4 on 4.10 passes as well but is very slow.
> > 
> > iomap dio maybe?  Not sure yet.
> 
> My test with 4.10 kernel suggested that test still failed there. And I
> digged into this test a bit, and found that it was commit d531d91d6990
> ("xfs: always use unwritten extents for direct I/O writes") introduced
> this failure, which is in v3.14 kernel.
> 
> This is because we start allocating unwritten extents for direct writes
> that can extend i_size, but in xfs_dio_write_end_io() we update in-core
> i_size before converting unwritten extents to real allocations. So a
> racing direct read could find the not-yet converted unwritten extents
> and read zeros instead of actual data.
> 
> But I'm not sure what's the best way to fix it. I think taking exclusive
> iolock instead of shared lock for direct writes that can extend i_size
> could fix the non-aio dio write case, but aio-dio write still fails,
> because in the aio-dio write case we defer end_io to a workqueue, which
> doesn't take any iolock at all..
> 

Can we pass a boolean or flag to xfs_iomap_write_unwritten() to have it
update the incore i_size after unwritten extent conversion? Then move
(or remove) the associated update from xfs_dio_write_end_io().

Brian

> ext4 has no such problem because ext4 converts unwritten extents before
> updating i_size, and ext4 doesn't support appending aio dio writes.
> 
> (Keep the rest of patch untrimmed for reference, as I added linux-xfs to
> cc list.)
> 
> Thanks,
> Eryu
> 
> > 
> > changelog v3:
> >  * rebase against latest xfstests/master branch
> >  * update commit log
> > 
> > changelog v2:
> >  * add '-lpthread' into LLDLIBS
> > 
> > diff --git a/configure.ac b/configure.ac
> > index 57092f1..4663004 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -59,6 +59,7 @@ AC_PACKAGE_NEED_GETXATTR_LIBATTR
> >  AC_PACKAGE_NEED_SYS_ACL_H
> >  AC_PACKAGE_NEED_ACL_LIBACL_H
> >  AC_PACKAGE_NEED_ACLINIT_LIBACL
> > +AC_PACKAGE_NEED_PTHREADMUTEXINIT
> >  
> >  AC_PACKAGE_WANT_GDBM
> >  AC_PACKAGE_WANT_AIO
> > diff --git a/include/builddefs.in b/include/builddefs.in
> > index cb52b99..fcc8b90 100644
> > --- a/include/builddefs.in
> > +++ b/include/builddefs.in
> > @@ -25,6 +25,7 @@ LIBGDBM = @libgdbm@
> >  LIBUUID = @libuuid@
> >  LIBHANDLE = @libhdl@
> >  LIBDM = @libdm@
> > +LIBPTHREAD = @libpthread@
> >  LIBTEST = $(TOPDIR)/lib/libtest.la
> >  prefix = @prefix@
> >  
> > diff --git a/src/Makefile b/src/Makefile
> > index b8aff49..e9419bd 100644
> > --- a/src/Makefile
> > +++ b/src/Makefile
> > @@ -23,7 +23,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
> >  	seek_copy_test t_readdir_1 t_readdir_2 fsync-tester nsexec cloner \
> >  	renameat2 t_getcwd e4compact test-nextquota punch-alternating \
> >  	attr-list-by-handle-cursor-test listxattr dio-interleaved t_dir_type \
> > -	dio-invalidate-cache stat_test t_encrypted_d_revalidate
> > +	dio-invalidate-cache stat_test t_encrypted_d_revalidate diotest
> >  
> >  SUBDIRS =
> >  
> > diff --git a/src/diotest.c b/src/diotest.c
> > new file mode 100644
> > index 0000000..7d2378f
> > --- /dev/null
> > +++ b/src/diotest.c
> > @@ -0,0 +1,166 @@
> > +/*
> > + * Copyright (c) 2013 Alibaba Group.
> > + * All Rights Reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > + */
> > +
> > +/*
> > + * This is a normal case that we do some append dio writes and meanwhile
> > + * we do some dio reads.  Currently in vfs we don't ensure that i_size
> > + * is updated properly.  Hence the reader will read some data with '0'.
> > + * But we expect that the reader should read nothing or data with 'a'.
> > + */
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +
> > +#include <unistd.h>
> > +#include <sys/types.h>
> > +#include <sys/stat.h>
> > +#include <fcntl.h>
> > +#include <errno.h>
> > +
> > +#include <pthread.h>
> > +
> > +static char *prog;
> > +
> > +struct writer_data {
> > +	int fd;
> > +	size_t blksize;
> > +	char *buf;
> > +};
> > +
> > +static void usage(void)
> > +{
> > +	fprintf(stderr, "usage: %s [FILE]\n", prog);
> > +}
> > +
> > +static void *writer(void *arg)
> > +{
> > +	struct writer_data *data = (struct writer_data *)arg;
> > +	int ret;
> > +
> > +	ret = write(data->fd, data->buf, data->blksize);
> > +	if (ret < 0)
> > +		fprintf(stderr, "write file failed: %s\n", strerror(errno));
> > +
> > +	return NULL;
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +	pthread_t tid;
> > +	struct writer_data wdata;
> > +	size_t max_blocks = 128;		/* 128 */
> > +	size_t blksize = 1 * 1024 * 1024;	/* 1M */
> > +	char *rbuf = NULL, *wbuf = NULL;
> > +	int rfd = 0, wfd = 0;
> > +	int i, j;
> > +	int ret = 0;
> > +
> > +	prog = basename(argv[0]);
> > +
> > +	if (argc != 2) {
> > +		usage();
> > +		exit(1);
> > +	}
> > +
> > +	wfd = open(argv[1], O_CREAT|O_DIRECT|O_WRONLY|O_APPEND|O_TRUNC, S_IRWXU);
> > +	if (wfd < 0) {
> > +		fprintf(stderr, "failed to open write file: %s\n",
> > +			strerror(errno));
> > +		exit(1);
> > +	}
> > +
> > +	rfd = open(argv[1], O_DIRECT|O_RDONLY, S_IRWXU);
> > +	if (wfd < 0) {
> > +		fprintf(stderr, "failed to open read file: %s\n",
> > +			strerror(errno));
> > +		ret = 1;
> > +		goto err;
> > +	}
> > +
> > +	/*
> > +	 * We set 1024 as an alignment size for write buf.  Feel free to change
> > +	 * it with 4096.  But the problem is also hitted.
> > +	 */
> > +	if (posix_memalign((void **)&wbuf, 1024, blksize)) {
> > +		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
> > +		ret = 1;
> > +		goto err;
> > +	}
> > +
> > +	if (posix_memalign((void **)&rbuf, 4096, blksize)) {
> > +		fprintf(stderr, "failed to alloc memory: %s\n", strerror(errno));
> > +		ret = 1;
> > +		goto err;
> > +	}
> > +
> > +	memset(wbuf, 'a', blksize);
> > +	wdata.fd = wfd;
> > +	wdata.blksize = blksize;
> > +	wdata.buf = wbuf;
> > +
> > +	for (i = 0; i < max_blocks; i++) {
> > +		void *retval;
> > +
> > +		if (pthread_create(&tid, NULL, writer, &wdata)) {
> > +			fprintf(stderr, "create thread failed: %s\n",
> > +				strerror(errno));
> > +			ret = 1;
> > +			goto err;
> > +		}
> > +
> > +		memset(rbuf, 'b', blksize);
> > +		do {
> > +			ret = pread(rfd, rbuf, blksize, i * blksize);
> > +			if (ret < 0)
> > +				fprintf(stderr, "read file failed: %s\n",
> > +					strerror(errno));
> > +		} while (ret <= 0);
> > +
> > +		if (pthread_join(tid, &retval)) {
> > +			fprintf(stderr, " pthread join failed: %s\n",
> > +				strerror(errno));
> > +			ret = 1;
> > +			goto err;
> > +		}
> > +
> > +		if (ret >= 0) {
> > +			for (j = 0; j < ret; j ++) {
> > +				if (rbuf[j] != 'a') {
> > +					fprintf(stderr, "encounter an error: "
> > +						"offset %d content %c\n",
> > +						i, rbuf[j]);
> > +					ret = 1;
> > +					goto err;
> > +				}
> > +			}
> > +		}
> > +	}
> > +
> > +err:
> > +	if (rfd)
> > +		close(rfd);
> > +	if (wfd)
> > +		close(wfd);
> > +	if (rbuf)
> > +		free(rbuf);
> > +	if (wbuf)
> > +		free(wbuf);
> > +
> > +	return ret;
> > +}
> > diff --git a/tests/generic/450 b/tests/generic/450
> > new file mode 100755
> > index 0000000..cfb424c
> > --- /dev/null
> > +++ b/tests/generic/450
> > @@ -0,0 +1,56 @@
> > +#! /bin/bash
> > +# FS QA Test No. 450
> > +#
> > +# Test i_size is updated properly under dio read/write
> > +#
> > +#-----------------------------------------------------------------------
> > +# Copyright (c) 2013 Alibaba Group.  All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#-----------------------------------------------------------------------
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1	# failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +    cd /
> > +    rm -f $tmp.* $testfile
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +
> > +# real QA test starts here
> > +_supported_fs generic
> > +_supported_os Linux
> > +
> > +testfile=$TEST_DIR/$seq.$$
> > +
> > +[ -x $here/src/diotest ] || _notrun "diotest not built"
> > +
> > +$here/src/diotest $testfile # > $seqres.full 2>&1 ||
> > +	# _fail "i_size isn't update properly!"
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/generic/450.out b/tests/generic/450.out
> > new file mode 100644
> > index 0000000..734761a
> > --- /dev/null
> > +++ b/tests/generic/450.out
> > @@ -0,0 +1 @@
> > +QA output created by 450
> > diff --git a/tests/generic/group b/tests/generic/group
> > index b9cd0e8..a555fa0 100644
> > --- a/tests/generic/group
> > +++ b/tests/generic/group
> > @@ -452,3 +452,4 @@
> >  447 auto quick clone
> >  448 auto quick rw
> >  449 auto quick acl enospc
> > +450 auto rw quick
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux