Re: [PATCH 1/2] xfs: inode recovery readahead can race with inode buffer creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 11, 2016 at 02:24:53PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> When we do inode readahead in log recovery, we do can do the
> readahead before we've replayed the icreate transaction that stamps
> the buffer with inode cores. The inode readahead verifier catches
> this and marks the buffer as !done to indicate that it doesn't yet
> contain valid inodes.
> 
> In adding buffer error notification  (i.e. setting b_error = -EIO at
> the same time as as we clear the done flag) to such a readahead
> verifier failure, we can then get subsequent inode recovery failing
> with this error:
> 
> XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
> 
> This occurs when readahead completion races with icreate item replay
> such as:
> 
> 	inode readahead
> 		find buffer
> 		lock buffer
> 		submit RA io
> 	....
> 	icreate recovery
> 	    xfs_trans_get_buffer
> 		find buffer
> 		lock buffer
> 		<blocks on RA completion>
> 	.....
> 	<ra completion>
> 		fails verifier
> 		clear XBF_DONE
> 		set bp->b_error = -EIO
> 		release and unlock buffer
> 	<icreate gains lock>
> 	icreate initialises buffer
> 	marks buffer as done
> 	adds buffer to delayed write queue
> 	releases buffer
> 
> At this point, we have an initialised inode buffer that is up to
> date but has an -EIO state registered against it. When we finally
> get to recovering an inode in that buffer:
> 
> 	inode item recovery
> 	    xfs_trans_read_buffer
> 		find buffer
> 		lock buffer
> 		sees XBF_DONE is set, returns buffer
> 	    sees bp->b_error is set
> 		fail log recovery!
> 
> Essentially, we need xfs_trans_get_buf_map() to clear the error status of
> the buffer when doing a lookup. This function returns uninitialised
> buffers, so the buffer returned can not be in an error state and
> none of the code that uses this function expects b_error to be set
> on return. Indeed, there is an ASSERT(!bp->b_error); in the
> transaction case in xfs_trans_get_buf_map() that would have caught
> this if log recovery used transactions....
> 
> This patch firstly changes the inode readahead failure to set -EIO
> on the buffer, and secondly changes xfs_buf_get_map() to never
> return a buffer with an error state set so this first change doesn't
> cause unexpected log recovery failures.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---

Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>

>  fs/xfs/libxfs/xfs_inode_buf.c | 12 +++++++-----
>  fs/xfs/xfs_buf.c              |  7 +++++++
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 1b8d98a..ff17c48 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -62,11 +62,12 @@ xfs_inobp_check(
>   * has not had the inode cores stamped into it. Hence for readahead, the buffer
>   * may be potentially invalid.
>   *
> - * If the readahead buffer is invalid, we don't want to mark it with an error,
> - * but we do want to clear the DONE status of the buffer so that a followup read
> - * will re-read it from disk. This will ensure that we don't get an unnecessary
> - * warnings during log recovery and we don't get unnecssary panics on debug
> - * kernels.
> + * If the readahead buffer is invalid, we need to mark it with an error and
> + * clear the DONE status of the buffer so that a followup read will re-read it
> + * from disk. We don't report the error otherwise to avoid warnings during log
> + * recovery and we don't get unnecssary panics on debug kernels. We use EIO here
> + * because all we want to do is say readahead failed; there is no-one to report
> + * the error to, so this will distinguish it from a non-ra verifier failure.
>   */
>  static void
>  xfs_inode_buf_verify(
> @@ -93,6 +94,7 @@ xfs_inode_buf_verify(
>  						XFS_RANDOM_ITOBP_INOTOBP))) {
>  			if (readahead) {
>  				bp->b_flags &= ~XBF_DONE;
> +				xfs_buf_ioerror(bp, -EIO);
>  				return;
>  			}
>  
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 45a8ea7..ae86b16 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -604,6 +604,13 @@ found:
>  		}
>  	}
>  
> +	/*
> +	 * Clear b_error if this is a lookup from a caller that doesn't expect
> +	 * valid data to be found in the buffer.
> +	 */
> +	if (!(flags & XBF_READ))
> +		xfs_buf_ioerror(bp, 0);
> +
>  	XFS_STATS_INC(target->bt_mount, xb_get);
>  	trace_xfs_buf_get(bp, flags, _RET_IP_);
>  	return bp;
> -- 
> 2.5.0
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux