Re: [PATCH 2/5] xfs: external logs need to flush data device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 22, 2021 at 11:53:32AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> The recent journal flush/FUA changes replaced the flushing of the
> data device on every iclog write with an up-front async data device
> cache flush. Unfortunately, the assumption of which this was based
> on has been proven incorrect by the flush vs log tail update
> ordering issue. As the fix for that issue uses the
> XLOG_ICL_NEED_FLUSH flag to indicate that data device needs a cache
> flush, we now need to (once again) ensure that an iclog write to
> external logs that need a cache flush to be issued actually issue a
> cache flush to the data device as well as the log device.
> 
> Fixes: eef983ffeae7 ("xfs: journal IO cache flush reductions")
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_log.c | 19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 96434cc4df6e..a3c4d48195d9 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -827,13 +827,6 @@ xlog_write_unmount_record(
>  	/* account for space used by record data */
>  	ticket->t_curr_res -= sizeof(ulf);
>  
> -	/*
> -	 * For external log devices, we need to flush the data device cache
> -	 * first to ensure all metadata writeback is on stable storage before we
> -	 * stamp the tail LSN into the unmount record.
> -	 */
> -	if (log->l_targ != log->l_mp->m_ddev_targp)
> -		blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev);
>  	return xlog_write(log, &vec, ticket, NULL, NULL, XLOG_UNMOUNT_TRANS);
>  }
>  
> @@ -1796,10 +1789,20 @@ xlog_write_iclog(
>  	 * metadata writeback and causing priority inversions.
>  	 */
>  	iclog->ic_bio.bi_opf = REQ_OP_WRITE | REQ_META | REQ_SYNC | REQ_IDLE;
> -	if (iclog->ic_flags & XLOG_ICL_NEED_FLUSH)
> +	if (iclog->ic_flags & XLOG_ICL_NEED_FLUSH) {
>  		iclog->ic_bio.bi_opf |= REQ_PREFLUSH;
> +		/*
> +		 * For external log devices, we also need to flush the data
> +		 * device cache first to ensure all metadata writeback covered
> +		 * by the LSN in this iclog is on stable storage. This is slow,
> +		 * but it *must* complete before we issue the external log IO.

I'm a little confused about what's going on here.  We're about to write
a log record to disk, with h_tail_lsn reflecting the tail of the log and
h_lsn reflecting the current head of the log (i.e. this record).

If the log tail has moved forward since the last log record was written
and this fs has an external log, we need to flush the data device
because the AIL could have written logged items back into the filesystem
and we need to ensure those items have been persisted before we write to
the log the fact that the tail moved forward.  The AIL itself doesn't
issue cache flushes (nor does it need to), so that's why we do that
here.

Why don't we need a flush like this if only FUA is set?  Is it not
possible to write a checkpoint that fits within a single iclog after the
log tail has moved forward?

--D

> +		 */
> +		if (log->l_targ != log->l_mp->m_ddev_targp)
> +			blkdev_issue_flush(log->l_mp->m_ddev_targp->bt_bdev);
> +	}
>  	if (iclog->ic_flags & XLOG_ICL_NEED_FUA)
>  		iclog->ic_bio.bi_opf |= REQ_FUA;
> +
>  	iclog->ic_flags &= ~(XLOG_ICL_NEED_FLUSH | XLOG_ICL_NEED_FUA);
>  
>  	if (xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count)) {
> -- 
> 2.31.1
> 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux