Re: [PATCH] xfs: return errors from partial I/O failures to files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 26, 2015 at 03:06:36PM -0400, David Jeffery wrote:
> There is an issue with xfs's error reporting in some cases of I/O partially
> failing and partially succeeding. Calls like fsync() can report success even
> though not all I/O was successful.

Hi David,

I read your bug report last night and after considering all the work
you put into it, I was going to ask if you wanted to finish off the
job by writing the patch to fix it. But you beat me to it.

Nice work! :)

> The issue can occur when there are multiple bio per xfs_ioend struct.
> Each call to xfs_end_bio() for a bio completing will write a value to
> ioend->io_error.  If a successful bio completes after any failed bio, no
> error is reported do to it writing 0 over the error code set by any failed bio.
> The I/O error information is now lost and when the ioend is completed
> only success is reported back up the filesystem stack.

It's worth mentioning the case that this was seen in - a single
failed disk in a raid 0 stripe, and the error from the bio to the
failed disk was overwritten by the successes from the bios to the
other disks.

FWIW, I think that we also need to create an xfstest for this case,
too, because it's clear that this is a big hole in our test coverage
(i.e.  partial block device failure). It might be best to talk to
Eryu (cc'd) to get your reproducer converted into a xfstest case
that we can then test all filesystems against?

> xfs_end_bio() should only set ioend->io_error in the case of BIO_UPTODATE
> being clear.  ioend->io_error is initialized to 0 at allocation so only needs
> to be updated by any failed bio structs. This ensures an error can be reported
> to the application.
> 
> Signed-off-by: David Jeffery <djeffery@xxxxxxxxxx>
> ---

Best to add a "cc: <stable@xxxxxxxxxxxxxxx>" so that it gets pushed
back to all the stable kernels automatically which it hits Linus'
tree.

One minor change to the fix:

> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 3859f5e..b82b128 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -356,7 +356,8 @@ xfs_end_bio(
>  {
>  	xfs_ioend_t		*ioend = bio->bi_private;
>  
> -	ioend->io_error = test_bit(BIO_UPTODATE, &bio->bi_flags) ? 0 : error;
> +	if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
> +		ioend->io_error = error;

We should preserve the original error that was reported, rather than
report the last one. ioend->io_error is always initialised to zero,
so we can simply do:

	if (!ioend->io_error && !test_bit(BIO_UPTODATE, &bio->bi_flags))
		ioend->io_error = error;

Can you update the patch and resend it? I've got a couple of other
fixes that I need to push to the for-next tree in the next couple of
days (i.e. before the 4.3. merge window opens) and I'd like to get
this one in as well.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux