Re: [PATCH v3 07/17] xfs: ratelimit unmount time per-buffer I/O error alert

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 01, 2020 at 08:07:43AM +1000, Dave Chinner wrote:
> On Wed, Apr 29, 2020 at 01:21:43PM -0400, Brian Foster wrote:
> > At unmount time, XFS emits an alert for every in-core buffer that
> > might have undergone a write error. In practice this behavior is
> > probably reasonable given that the filesystem is likely short lived
> > once I/O errors begin to occur consistently. Under certain test or
> > otherwise expected error conditions, this can spam the logs and slow
> > down the unmount.
> > 
> > Now that we have a ratelimit mechanism specifically for buffer
> > alerts, reuse it for the per-buffer alerts in xfs_wait_buftarg().
> > Also lift the final repair message out of the loop so it always
> > prints and assert that the metadata error handling code has shut
> > down the fs.
> > 
> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_buf.c | 15 +++++++++++----
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 594d5e1df6f8..8f0f605de579 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
...
> > @@ -1685,17 +1686,23 @@ xfs_wait_buftarg(
> >  			bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
> >  			list_del_init(&bp->b_lru);
> >  			if (bp->b_flags & XBF_WRITE_FAIL) {
> > -				xfs_alert(btp->bt_mount,
> > +				write_fail = true;
> > +				xfs_buf_alert_ratelimited(bp,
> > +					"XFS: Corruption Alert",
> >  "Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!",
> >  					(long long)bp->b_bn);
> > -				xfs_alert(btp->bt_mount,
> > -"Please run xfs_repair to determine the extent of the problem.");
> >  			}
> >  			xfs_buf_rele(bp);
> >  		}
> >  		if (loop++ != 0)
> >  			delay(100);
> >  	}
> > +
> > +	if (write_fail) {
> > +		ASSERT(XFS_FORCED_SHUTDOWN(btp->bt_mount));
> 
> I think this is incorrect. A metadata write that is set to retry
> forever and is failing because of a bad sector or some other
> persistent device error will not shut down the filesystem, but still
> be reported here as a failure. Hence we can easily get here without
> a filesystem shutdown having occurred...
> 

I'm confused by your comment because I don't see how we get here to free
a dirty buffer without the filesystem already shut down. AFAICT we're
going to spin (in freeze or unmount) until all outstanding buffers are
written back or converted to permanent errors, which shuts down the fs.
Hm?

Note that I don't object to turning this into a direct
xfs_force_shutdown() call as a fallback, if that's what you're asking
for (which isn't totally clear to me either). Obviously my assumption is
that we're already shut down anyways, but I'd like to get on the same
page here first...

Brian

> Cheers,
> 
> Dave.
> 
> 
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux