Re: [PATCH v4 00/17] xfs: flush related error handling cleanups

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 6 May 2020 08:36:43 +1000

On Tue, May 05, 2020 at 07:58:10AM -0400, Brian Foster wrote:
> On Tue, May 05, 2020 at 07:53:07AM +1000, Dave Chinner wrote:
> > On Mon, May 04, 2020 at 10:11:37AM -0400, Brian Foster wrote:
> > > Hi all,
> > > 
> > > I think everything has been reviewed to this point. Only minor changes
> > > noted below in this release. A git repo is available here[1].
> > > 
> > > The only outstanding feedback that I'm aware of is Dave's comment on
> > > patch 7 of v3 [2] regarding the shutdown assert check. I'm not aware of
> > > any means to get through xfs_wait_buftarg() with a dirty buffer that
> > > hasn't undergone the permanant error sequence and thus shut down the fs.
> > 
> > # echo 0 > /sys/fs/xfs/<dev>/fail_at_unmount
> > 
> > And now any error with a "retry forever" config (the default) will
> > be collected by xfs_buftarg_wait() without a preceeding shutdown as
> > xfs_buf_iodone_callback_error() will not treat it as a permanent
> > error during unmount. i.e. this doesn't trigger:
> > 
> >         /* At unmount we may treat errors differently */
> >         if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount)
> >                 goto permanent_error;
> > 
> > and so the error handling just marks it with a write error and lets
> > it go for a write retry in future. These are then collected in
> > xfs_buftarg_wait() as nothing is going to retry them once unmount
> > gets to this point...
> > 
> 
> That doesn't accurately describe the behavior of that configuration,
> though. "Retry forever" means that dirty buffers are going to cycle
> through submission retries and the unmount is going to hang indefinitely
> (on pushing the AIL). Indeed, preventing this unmount hang is the
> original purpose of the fail at unmount knob (commit here[1]).

So this patch is relying on something else hanging before we get to
xfs_wait_buftarg and hence "we don't have to handle that here".

Please document that assumption along with the ASSERT, because if we
change AIL behaviour to prevent retry-forever hangs with freeze
and/or remount-ro we're going to hit this assert...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx