Re: [PATCH 1/2] xfs: Add infrastructure needed for error propagation during buffer IO failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 11, 2017 at 03:57:32PM +0200, Carlos Maiolino wrote:
> To be able to resubmit an log item for IO, we need a way to mark an item
> as failed, if, for any reason the buffer which the item belonged to
> failed during writeback.
> 
> Add a new log item callback to be used after an IO completion failure
> and make the needed clean ups.
> 

I think the commit log description should call out the problem with
flush locked items (i.e., that we will currently never resubmit their
buffers) as the motiviation for the patch.

> Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_buf_item.c | 27 ++++++++++++++++++++++++++-
>  fs/xfs/xfs_trans.h    |  5 ++++-
>  2 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
> index 0306168..026aed4 100644
> --- a/fs/xfs/xfs_buf_item.c
> +++ b/fs/xfs/xfs_buf_item.c
> @@ -1051,6 +1051,24 @@ xfs_buf_do_callbacks(
>  	}
>  }
>  
> +STATIC void
> +xfs_buf_do_callbacks_fail(
> +	struct xfs_buf		*bp)
> +{
> +	struct xfs_log_item	*lip, *next;
> +	unsigned int		bflags = bp->b_flags;
> +
> +	lip = bp->b_fspriv;
> +	while (lip != NULL) {
> +		next = lip->li_bio_list;
> +
> +		if (lip->li_ops->iop_error)
> +			lip->li_ops->iop_error(lip, bflags);

I still don't see why we need the iop callback here. This type of
callback is typically required when an operation requires some action on
the specific subtype (e.g., _inode_item_error() does one particular
thing to an inode, buf_item_error() might do something different to an
xfs_buf, etc.), but that doesn't appear to be the case here. Indeed, the
next patch shows that the inode item error handler does:

	lip->li_flags |= XFS_LI_FAILED;

... which doesn't even require to dereference the inode_log_item type.

So can we just set the flag directly from xfs_buf_do_callbacks_fail()
and kill of ->iop_error() until/unless we come to a point where it is
actually needed?

> +
> +		lip = next;
> +	}
> +}
> +
>  static bool
>  xfs_buf_iodone_callback_error(
>  	struct xfs_buf		*bp)
> @@ -1153,8 +1171,15 @@ xfs_buf_iodone_callbacks(
>  	 * to run callbacks after failure processing is done so we
>  	 * detect that and take appropriate action.
>  	 */
> -	if (bp->b_error && xfs_buf_iodone_callback_error(bp))
> +	if (bp->b_error && xfs_buf_iodone_callback_error(bp)) {
> +
> +		/*
> +		 * We've got an error during buffer writeback, we need to notify
> +		 * the items in the buffer
> +		 */
> +		xfs_buf_do_callbacks_fail(bp);

xfs_buf_iodone_callback_error() returns true when the I/O has failed. It
also returns true when it has submitted the internal retry[1], however,
so I don't think this is quite correct. We should only mark items as
failed once this internal sequence has completed and the buffer is no
longer under I/O. As it is, this looks like it would mark the items as
failed while they are still under the internal retry I/O (and possibly
leave them marked as such if this retry actually succeeds..?).

Side note: I really dislike the semantics of
xfs_buf_iodone_callback_error() in that I have to read it and the only
call site to re-understand what the return value means every time I look
at it. Could we add a comment above that function that explains the
return value dictates whether to run callbacks while we're working in
this area?

Brian

[1] Recall that every buffer submitted through xfsaild() is quietly
retried one time in the event of I/O error (via XBF_WRITE_FAIL) before
the buffer is unlocked and effectively released back to the AIL. This is
presumably to help deal with transient errors. It is only when this
second I/O fails that the buffer is unlocked and it is up to the AIL to
resubmit the buffer on a subsequent push.

>  		return;
> +	}
>  
>  	/*
>  	 * Successful IO or permanent error. Either way, we can clear the
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index a07acbf..c57181a 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -65,10 +65,12 @@ typedef struct xfs_log_item {
>  
>  #define	XFS_LI_IN_AIL	0x1
>  #define XFS_LI_ABORTED	0x2
> +#define XFS_LI_FAILED	0x3
>  
>  #define XFS_LI_FLAGS \
>  	{ XFS_LI_IN_AIL,	"IN_AIL" }, \
> -	{ XFS_LI_ABORTED,	"ABORTED" }
> +	{ XFS_LI_ABORTED,	"ABORTED" }, \
> +	{ XFS_LI_FAILED,	"FAILED" }
>  
>  struct xfs_item_ops {
>  	void (*iop_size)(xfs_log_item_t *, int *, int *);
> @@ -79,6 +81,7 @@ struct xfs_item_ops {
>  	void (*iop_unlock)(xfs_log_item_t *);
>  	xfs_lsn_t (*iop_committed)(xfs_log_item_t *, xfs_lsn_t);
>  	void (*iop_committing)(xfs_log_item_t *, xfs_lsn_t);
> +	void (*iop_error)(xfs_log_item_t *, unsigned int bflags);
>  };
>  
>  void	xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item,
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux