On Fri, Aug 05, 2022 at 09:04:38AM +1000, Dave Chinner wrote: > On Thu, Aug 04, 2022 at 11:06:28AM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > If a blkdev_issue_flush fails, fsync needs to report that to upper > > levels. Modify xfs_file_fsync to capture the errors, while trying to > > flush as much data and log updates to disk as possible. > > > > If log writes cannot flush the data device, we need to shut down the log > > immediately because we've violated a log invariant. Modify this code to > > check the return value of blkdev_issue_flush as well. > > > > This behavior seems to go back to about 2.6.15 or so, which makes this > > fixes tag a bit misleading. > > > > Link: https://elixir.bootlin.com/linux/v2.6.15/source/fs/xfs/xfs_vnodeops.c#L1187 > > Fixes: b5071ada510a ("xfs: remove xfs_blkdev_issue_flush") > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > --- > > fs/xfs/xfs_file.c | 22 ++++++++++++++-------- > > fs/xfs/xfs_log.c | 11 +++++++++-- > > 2 files changed, 23 insertions(+), 10 deletions(-) > > Looks good, couple of minor nits you can take or leave. > > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index 5a171c0b244b..a02000be931b 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -142,7 +142,7 @@ xfs_file_fsync( > > { > > struct xfs_inode *ip = XFS_I(file->f_mapping->host); > > struct xfs_mount *mp = ip->i_mount; > > - int error = 0; > > + int error, err2; > > int log_flushed = 0; > > > > trace_xfs_file_fsync(ip); > > @@ -163,18 +163,21 @@ xfs_file_fsync( > > * inode size in case of an extending write. > > */ > > if (XFS_IS_REALTIME_INODE(ip)) > > - blkdev_issue_flush(mp->m_rtdev_targp->bt_bdev); > > + error = blkdev_issue_flush(mp->m_rtdev_targp->bt_bdev); > > else if (mp->m_logdev_targp != mp->m_ddev_targp) > > - blkdev_issue_flush(mp->m_ddev_targp->bt_bdev); > > + error = blkdev_issue_flush(mp->m_ddev_targp->bt_bdev); > > > > /* > > * Any inode that has dirty modifications in the log is pinned. The > > - * racy check here for a pinned inode while not catch modifications > > + * racy check here for a pinned inode will not catch modifications > > * that happen concurrently to the fsync call, but fsync semantics > > * only require to sync previously completed I/O. > > */ > > - if (xfs_ipincount(ip)) > > - error = xfs_fsync_flush_log(ip, datasync, &log_flushed); > > + if (xfs_ipincount(ip)) { > > + err2 = xfs_fsync_flush_log(ip, datasync, &log_flushed); > > + if (!error && err2) > > + error = err2; > > This is better done as > > if (err2 && !error) > ..... > > Because we only care about the value of error if err2 is non zero. > Hence for normal operation where there are no errors, checking err2 > first is less code to execute as error never needs to be checked... Ok, fixed. > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > index 4b1c0a9c6368..15d7cdc7a632 100644 > > --- a/fs/xfs/xfs_log.c > > +++ b/fs/xfs/xfs_log.c > > @@ -1925,9 +1925,16 @@ xlog_write_iclog( > > * device cache first to ensure all metadata writeback covered > > * by the LSN in this iclog is on stable storage. This is slow, > > * but it *must* complete before we issue the external log IO. > > + * > > + * If the flush fails, we cannot conclude that past metadata > > + * writeback from the log succeeded, which is effectively a > > * writeback from the log succeeded, and repeating > * the flush from iclog IO is not possible. Hence we have to > * shut down with log IO error to avoid shutdown > * re-entering this path and erroring out here again. > */ I decided to tweak the paragraph a little: /* * If the flush fails, we cannot conclude that past metadata * writeback from the log succeeded. Repeating the flush is * not possible, hence we must shut down with log IO error to * avoid shutdown re-entering this path and erroring out * again. */ Thanks for reviewing! :) --D > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx