On Thu, Apr 23, 2020 at 10:29:58AM -0400, Brian Foster wrote: > On Thu, Apr 23, 2020 at 02:46:04PM +1000, Dave Chinner wrote: > > On Wed, Apr 22, 2020 at 01:54:21PM -0400, Brian Foster wrote: > > > At unmount time, XFS emits a warning for every in-core buffer that > > > might have undergone a write error. In practice this behavior is > > > probably reasonable given that the filesystem is likely short lived > > > once I/O errors begin to occur consistently. Under certain test or > > > otherwise expected error conditions, this can spam the logs and slow > > > down the unmount. > > > > > > We already have a ratelimit state defined for buffers failing > > > writeback. Fold this state into the buftarg and reuse it for the > > > unmount time errors. > > > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > > > Looks fine, but I suspect we both missed something here: > > xfs_buf_ioerror_alert() was made a ratelimited printk in the last > > cycle: > > > > void > > xfs_buf_ioerror_alert( > > struct xfs_buf *bp, > > xfs_failaddr_t func) > > { > > xfs_alert_ratelimited(bp->b_mount, > > "metadata I/O error in \"%pS\" at daddr 0x%llx len %d error %d", > > func, (uint64_t)XFS_BUF_ADDR(bp), bp->b_length, > > -bp->b_error); > > } > > > > Yeah, I hadn't noticed that. > > > Hence I think all these buffer error alerts can be brought under the > > same rate limiting variable. Something like this in xfs_message.c: > > > > One thing to note is that xfs_alert_ratelimited() ultimately uses > the DEFAULT_RATELIMIT_INTERVAL of 5s. The ratelimit we're generalizing > here uses 30s (both use a burst of 10). That seems reasonable enough to > me for I/O errors so I'm good with the changes below. > > FWIW, that also means we could just call xfs_buf_alert_ratelimited() > from xfs_buf_item_push() if we're also Ok with using an "alert" instead > of a "warn." I'm not immediately aware of a reason to use one over the > other (xfs_wait_buftarg() already uses alert) so I'll try that unless I > hear an objection. SOunds fine to me. > The xfs_wait_buftarg() ratelimit presumably remains > open coded because it's two separate calls and we probably don't want > them to individually count against the limit. That's why I suggested dropping the second "run xfs_repair" message and triggering a shutdown after the wait loop. That way we don't issue "run xfs_repair" for every single failed buffer (largely noise!), and we get a non-rate-limited common "run xfs-repair" message once we processed all the failed writes. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx