Re: [PATCH] test extending sub-block AIO writes for races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 30, 2015 at 07:53:39AM -0500, Eric Sandeen wrote:
> 
> 
> On 9/30/15 6:47 AM, Brian Foster wrote:
> > On Tue, Sep 29, 2015 at 08:03:31PM -0500, Eric Sandeen wrote:
> >> This tests bfoster's
> >> [PATCH 1/2] xfs: always drain dio before extending aio write submission
> >> patch; it launches four adjacent 1k IOs past EOF, then reads back
> >> to see if we have 4k worth of the data we wrote, or something else - 
> >> possibly zeros from sub-block zeroing and eof racing.
> >>
> >> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> >> ---
> >>
> > 
> > Thanks for the test! A few high level comments...
> > 
> >>
> >> diff --git a/src/aio-dio-regress/aio-dio-eof-race.c b/src/aio-dio-regress/aio-dio-eof-race.c
> >> new file mode 100644
> >> index 0000000..c1ce695
> >> --- /dev/null
> >> +++ b/src/aio-dio-regress/aio-dio-eof-race.c
> >> @@ -0,0 +1,170 @@
> >> +/*
> >> + * Launch 4 file-extending extending sub-block AIOs and ensure that we
> >> + * don't see data corruption when they're all complete.
> >> + *
> > 
> > It might be good to point out that this stresses EOF zeroing, which also
> > means a key point is not just file-extending I/O, but file-extending I/O
> > beyond current EOF (e.g., I/O that creates holes between the current EOF
> > and start of the new I/O).
> 
> yeah, ok, good to be specific.
> 
> >> + * Copyright (C) 2015 Red Hat, Inc. All Rights reserved.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License as published by
> >> + * the Free Software Foundation; either version 2 of the License, or
> >> + * (at your option) any later version.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, write to the Free Software
> >> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
> >> + */
> >> +#include <sys/stat.h>
> >> +#include <sys/types.h>
> >> +#include <errno.h>
> >> +#include <fcntl.h>
> >> +#include <stdio.h>
> >> +#include <stdlib.h>
> >> +#include <unistd.h>
> >> +#include <ctype.h>
> >> +
> >> +#include <libaio.h>
> >> +
> >> +#define BUF_SIZE	4096
> > 
> > Have you considered making the buffer size variable? As is currently
> > written, I don't think this will reproduce the problem on 512b or 1024b
> > fsb filesystems because at least some of the file-extending I/Os must
> > not be block aligned for it to trigger.
> 
> yeah, I meant to make it smaller, then got distracted, tbh ;
> 
> But I think launching 4x512 IOs should always do the trick, right?
> 

I would think so, though I don't know how many unaligned I/Os were
necessary per io_submit() to make the test effective (e.g., 3 of 4 on
4k/2k fsb vs. 2 of 4 on 1k fsb). Might be worth a try to be sure.

> And it's impossible to repro on a 512b fs block I think, because
> we can't have any sub-block/unaligned IOs?
> 
> So I think that perhaps switching it to 2048 & doing 4x512 IOs
> would suffice, right?
> 

Yeah, good point. I don't think this reproducer could trigger the
problem with 512b FSBs. I think the corruption is still possible on such
fs', but it would probably require extending I/O to a file with dirty
pages and post-eof speculative preallocation. In short, I think that's
probably a separate reproducer.

Given that and if the above is effective, perhaps it's good enough to
adjust the default buffer size and whatnot and forget the tunables and
things.

Brian

> >> +#define IO_PATTERN	0xab
> >> +
> >> +void
> >> +dump_buffer(
> >> +	void	*buf,
> >> +	off64_t	offset,
> >> +	ssize_t	len)
> >> +{
> >> +	int	i, j;
> >> +	char	*p;
> >> +	int	new;
> >> +	
> >> +	for (i = 0, p = (char *)buf; i < len; i += 16) {
> >> +		char    *s = p;
> >> +
> >> +		if (i && !memcmp(p, p - 16, 16)) {
> >> +			new = 0;
> >> +		} else {
> >> +			if (i)
> >> +				printf("*\n");
> >> +			new = 1;
> >> +		}
> >> +
> >> +		if (!new) {
> >> +			p += 16;
> >> +			continue;
> >> +		}
> >> +
> >> +		printf("%08llx  ", (unsigned long long)offset + i);
> >> +		for (j = 0; j < 16 && i + j < len; j++, p++)
> >> +			printf("%02x ", *p);
> >> +		printf(" ");
> >> +		for (j = 0; j < 16 && i + j < len; j++, s++) {
> >> +			if (isalnum((int)*s))
> >> +				printf("%c", *s);
> >> +			else
> >> +				printf(".");
> >> +		}
> >> +		printf("\n");
> >> +
> >> +	}
> >> +	printf("%08llx\n", (unsigned long long)offset + i);
> >> +}
> >> +
> >> +int main(int argc, char *argv[])
> >> +{
> >> +	struct io_context *ctx = NULL;
> >> +	struct io_event evs[4];
> >> +	struct iocb iocb1, iocb2, iocb3, iocb4;
> >> +	struct iocb *iocbs[] = { &iocb1, &iocb2, &iocb3, &iocb4 };
> >> +	void *buf;
> >> +	struct stat statbuf;
> >> +	char cmp_buf[BUF_SIZE];
> >> +	int fd, err = 0;
> >> +	off_t eof;
> >> +
> >> +	fd = open(argv[1], O_DIRECT | O_CREAT | O_TRUNC | O_RDWR, 0600);
> >> +	if (fd == -1) {
> >> +		perror("open");
> >> +		return 1;
> >> +	}
> >> +
> >> +	err = posix_memalign(&buf, BUF_SIZE, BUF_SIZE);
> > 
> > The alignment should technically be PAGE_SIZE (even though BUF_SIZE ==
> > 4096), right?
> 
> hm, tbh this just came along from the other AIO tests I copied from ;)
> Sure, I can do it page-sized, that'd be better.
> 
> >> +	if (err) {
> >> +		fprintf(stderr, "error %s during %s\n",
> >> +			strerror(err),
> >> +			"posix_memalign");
> >> +		return 1;
> >> +	}
> >> +	memset(cmp_buf, IO_PATTERN, BUF_SIZE);
> >> +
> >> +	err = io_setup(4, &ctx);
> >> +	if (err) {
> >> +		fprintf(stderr, "error %s during %s\n",
> >> +			strerror(err),
> >> +			"io_setup");
> >> +		return 1;
> >> +	}
> >> +
> >> +	eof = 0;
> >> +
> >> +	/* Keep extending until 8MB */
> >> +	while (eof < 8 * 1024 * 1024) {
> >> +		memset(buf, IO_PATTERN, BUF_SIZE);
> >> +
> >> +		fstat(fd, &statbuf);
> >> +		eof = statbuf.st_size;
> >> +
> >> +		/*
> >> +		 * 4 ios, racing to extend EOF, combined they write full BUF_SIZE
> >> +		 */
> > 
> > I would expand on this comment with something like the following:
> > 
> > "Split the buffer into multiple I/Os to send a mix of block
> > aligned/unaligned writes as well as writes that start beyond the current
> > EOF. This stresses things like inode size management and stale block
> > zeroing for races and can lead to data corruption when not handled
> > properly."
> > 
> > ... but feel free to reword that as necessary. I just wanted to point
> > out that 1.) unaligned I/O is required and 2.) write offsets beyond EOF
> > are required.
> 
> ok
> 
>  
> >> +		io_prep_pwrite(&iocb1, fd, buf, BUF_SIZE/4, eof + 0 * BUF_SIZE/4);
> >> +		io_prep_pwrite(&iocb2, fd, buf, BUF_SIZE/4, eof + 1 * BUF_SIZE/4);
> >> +		io_prep_pwrite(&iocb3, fd, buf, BUF_SIZE/4, eof + 2 * BUF_SIZE/4);
> >> +		io_prep_pwrite(&iocb4, fd, buf, BUF_SIZE/4, eof + 3 * BUF_SIZE/4);
> >> +	
> >> +		err = io_submit(ctx, 4, iocbs);
> >> +		if (err != 4) {
> >> +			fprintf(stderr, "error %s during %s\n",
> >> +				strerror(err),
> >> +				"io_submit");
> >> +			return 1;
> >> +		}
> >> +
> >> +		err = io_getevents(ctx, 4, 4, evs, NULL);
> >> +		if (err != 4) {
> >> +			fprintf(stderr, "error %s during %s\n",
> >> +				strerror(err),
> >> +				"io_getevents");
> >> +			return 1;
> >> +		}
> >> +
> >> +		/*
> >> +		 * And then read it back.
> >> +		 *
> >> +		 * Using pread to keep it simple, but AIO has the same effect.
> >> +		 *
> >> +		 * eof is the old eof, we just wrote BUF_SIZE more
> >> +		 */
> >> +		if (pread(fd, buf, BUF_SIZE, eof) != BUF_SIZE) {
> >> +			perror("pread");
> >> +			return 1;
> >> +		}
> >> +
> >> +		/*
> >> +		 * We launched 4 AIOs which, stiched together, should write
> > 
> > Nit:					     stitched
> 
> doh ;)
> 
> >> +		 * a seamless BUF_SIZE worth of IO_PATTERN to the last block
> >> +		 */
> >> +		if (memcmp(buf, cmp_buf, BUF_SIZE)) {
> >> +			printf("corruption while extending from %ld\n", eof);
> >> +			dump_buffer(buf, 0, BUF_SIZE);
> >> +			return 1;
> >> +		}
> >> +	}
> >> +
> >> +	printf("Success, all done.\n");
> >> +	return 0;
> >> +}
> >> diff --git a/tests/generic/326 b/tests/generic/326
> >> new file mode 100755
> >> index 0000000..7db04ac
> >> --- /dev/null
> >> +++ b/tests/generic/326
> >> @@ -0,0 +1,64 @@
> >> +#! /bin/bash
> >> +# FS QA Test No. 326
> >> +#
> >> +# Test races while extending past EOF via sub-block AIO writes
> >> +#
> >> +#-----------------------------------------------------------------------
> >> +# Copyright (c) 2015 Red Hat, Inc.  All Rights Reserved.
> >> +#
> >> +# This program is free software; you can redistribute it and/or
> >> +# modify it under the terms of the GNU General Public License as
> >> +# published by the Free Software Foundation.
> >> +#
> >> +# This program is distributed in the hope that it would be useful,
> >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> +# GNU General Public License for more details.
> >> +#
> >> +# You should have received a copy of the GNU General Public License
> >> +# along with this program; if not, write the Free Software Foundation,
> >> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> >> +#-----------------------------------------------------------------------
> >> +#
> >> +
> >> +seq=`basename $0`
> >> +seqres=$RESULT_DIR/$seq
> >> +echo "QA output created by $seq"
> >> +
> >> +here=`pwd`
> >> +tmp=/tmp/$$
> >> +status=1	# failure is the default!
> >> +trap "_cleanup; exit \$status" 0 1 2 3 15
> >> +
> >> +_cleanup()
> >> +{
> >> +    cd /
> >> +    rm -f $TEST_DIR/tst-aio-dio-eof-race
> >> +}
> >> +
> >> +# get standard environment, filters and checks
> >> +. ./common/rc
> >> +. ./common/filter
> >> +
> >> +_supported_fs generic
> >> +_supported_os Linux
> >> +
> >> +_require_test
> >> +_require_sparse_files
> >> +_require_aiodio aio-dio-eof-race
> >> +
> >> +# Test does 512 byte DIO, so make sure that'll work
> >> +logical_block_size=`_min_dio_alignment $TEST_DEV`
> >> +
> >> +if [ "$logical_block_size" -gt "512" ]; then
> >> +	_notrun "device block size: $logical_block_size greater than 512"
> >> +fi
> >> +
> > 
> > Technically the test splits a 4k buffer into four parts and so sends 1k
> > dio. Not that it really matters here, but I guess I'd update the
> > comments. :)
> > 
> >> +# If 512 isn't a sub-block IO, the test should still pass, so
> >> +# let that go.
> >> +
> > 
> > Ok, so this at least points out the limitation to the test. I think it
> > would be slightly better if the test were configurable as mentioned in
> > the first comment above. For example, the test program could take a
> > block size parameter and "do the right thing" based on that.
> > Alternatively, we could specify buffer size and iocb count as parameters
> > and let the xfstests script send the right params based on the test fs.
> > I guess the latter would complicate things slightly because the program
> > probably wants to ensure full buffers are written in each io_submit()
> > instance for verification purposes.
> > 
> > Just some thoughts... we could leave it as is too. In that case I would
> > suggest to expand the comment above to be specific about which block
> > sizes (512b, 1k) do not result in unaligned I/O.
> 
> ok, I'll make it a bit more universal one way or the other.
> 
> Thanks for the review,
> -Eric
> 
> > Brian
> > 
> >> +# This test does several extending loops internally
> >> +$AIO_TEST $TEST_DIR/tst-aio-dio-eof-race
> >> +
> >> +status=$?
> >> +exit
> >> diff --git a/tests/generic/326.out b/tests/generic/326.out
> >> new file mode 100644
> >> index 0000000..22a3e78
> >> --- /dev/null
> >> +++ b/tests/generic/326.out
> >> @@ -0,0 +1,2 @@
> >> +QA output created by 326
> >> +Success, all done.
> >> diff --git a/tests/generic/group b/tests/generic/group
> >> index 4ae256f..a5f3008 100644
> >> --- a/tests/generic/group
> >> +++ b/tests/generic/group
> >> @@ -207,3 +207,4 @@
> >>  323 auto aio stress
> >>  324 auto fsr quick
> >>  325 auto quick data log
> >> +326 auto quick aio
> >>
> >>
> >> _______________________________________________
> >> xfs mailing list
> >> xfs@xxxxxxxxxxx
> >> http://oss.sgi.com/mailman/listinfo/xfs
> > 
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux