On Thu, Apr 23, 2020 at 02:46:04PM +1000, Dave Chinner wrote: > On Wed, Apr 22, 2020 at 01:54:21PM -0400, Brian Foster wrote: > > At unmount time, XFS emits a warning for every in-core buffer that > > might have undergone a write error. In practice this behavior is > > probably reasonable given that the filesystem is likely short lived > > once I/O errors begin to occur consistently. Under certain test or > > otherwise expected error conditions, this can spam the logs and slow > > down the unmount. > > > > We already have a ratelimit state defined for buffers failing > > writeback. Fold this state into the buftarg and reuse it for the > > unmount time errors. > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > Looks fine, but I suspect we both missed something here: > xfs_buf_ioerror_alert() was made a ratelimited printk in the last > cycle: > > void > xfs_buf_ioerror_alert( > struct xfs_buf *bp, > xfs_failaddr_t func) > { > xfs_alert_ratelimited(bp->b_mount, > "metadata I/O error in \"%pS\" at daddr 0x%llx len %d error %d", > func, (uint64_t)XFS_BUF_ADDR(bp), bp->b_length, > -bp->b_error); > } > Yeah, I hadn't noticed that. > Hence I think all these buffer error alerts can be brought under the > same rate limiting variable. Something like this in xfs_message.c: > One thing to note is that xfs_alert_ratelimited() ultimately uses the DEFAULT_RATELIMIT_INTERVAL of 5s. The ratelimit we're generalizing here uses 30s (both use a burst of 10). That seems reasonable enough to me for I/O errors so I'm good with the changes below. FWIW, that also means we could just call xfs_buf_alert_ratelimited() from xfs_buf_item_push() if we're also Ok with using an "alert" instead of a "warn." I'm not immediately aware of a reason to use one over the other (xfs_wait_buftarg() already uses alert) so I'll try that unless I hear an objection. The xfs_wait_buftarg() ratelimit presumably remains open coded because it's two separate calls and we probably don't want them to individually count against the limit. Brian > void > xfs_buf_alert_ratelimited( > struct xfs_buf *bp, > const char *rlmsg, > const char *fmt, > ...) > { > struct va_format vaf; > va_list args; > > if (!___ratelimit(&bp->b_target->bt_ioerror_rl, rlmsg) > return; > > va_start(args, fmt); > vaf.fmt = fmt; > vaf.args = &args; > __xfs_printk(KERN_ALERT, bp->b_mount, &vaf); > va_end(args); > } > > and: > > void > xfs_buf_ioerror_alert( > struct xfs_buf *bp, > xfs_failaddr_t func) > { > xfs_buf_alert_ratelimited(bp, "XFS: metadata IO error", > "metadata I/O error in \"%pS\" at daddr 0x%llx len %d error %d", > func, (uint64_t)XFS_BUF_ADDR(bp), bp->b_length, -bp->b_error); > } > > > > --- > > fs/xfs/xfs_buf.c | 13 +++++++++++-- > > fs/xfs/xfs_buf.h | 1 + > > fs/xfs/xfs_buf_item.c | 10 +--------- > > 3 files changed, 13 insertions(+), 11 deletions(-) > > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > > index 7a6bc617f0a9..c28a93d2fd8c 100644 > > --- a/fs/xfs/xfs_buf.c > > +++ b/fs/xfs/xfs_buf.c > > @@ -1684,10 +1684,12 @@ xfs_wait_buftarg( > > struct xfs_buf *bp; > > bp = list_first_entry(&dispose, struct xfs_buf, b_lru); > > list_del_init(&bp->b_lru); > > - if (bp->b_flags & XBF_WRITE_FAIL) { > > + if (bp->b_flags & XBF_WRITE_FAIL && > > + ___ratelimit(&bp->b_target->bt_ioerror_rl, > > + "XFS: Corruption Alert")) { > > xfs_alert(btp->bt_mount, > > "Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!", > > - (long long)bp->b_bn); > > + (long long)bp->b_bn); > > xfs_alert(btp->bt_mount, > > "Please run xfs_repair to determine the extent of the problem."); > > } > > I think if we are tossing away metadata here, we should probably > shut down the filesystem once the loop has completed. That way we > get all the normal warnings about running xfs_repair and don't have > to open code it here... > > > - > > STATIC uint > > xfs_buf_item_push( > > struct xfs_log_item *lip, > > @@ -518,7 +510,7 @@ xfs_buf_item_push( > > > > /* has a previous flush failed due to IO errors? */ > > if ((bp->b_flags & XBF_WRITE_FAIL) && > > - ___ratelimit(&xfs_buf_write_fail_rl_state, "XFS: Failing async write")) { > > + ___ratelimit(&bp->b_target->bt_ioerror_rl, "XFS: Failing async write")) { > > xfs_warn(bp->b_mount, > > "Failing async write on buffer block 0x%llx. Retrying async write.", > > (long long)bp->b_bn); > > This gets simplified to: > > if (bp->b_flags & XBF_WRITE_FAIL) { > xfs_buf_alert_ratelimited(bp, "XFS: Failing async write", > "Failing async write on buffer block 0x%llx. Retrying async write.", > (long long)bp->b_bn); > } > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >