On Fri, Apr 24, 2020 at 07:12:32AM -0400, Brian Foster wrote: > On Fri, Apr 24, 2020 at 07:14:37AM +1000, Dave Chinner wrote: > > On Thu, Apr 23, 2020 at 10:29:58AM -0400, Brian Foster wrote: > > > On Thu, Apr 23, 2020 at 02:46:04PM +1000, Dave Chinner wrote: > > > > On Wed, Apr 22, 2020 at 01:54:21PM -0400, Brian Foster wrote: > > > > > At unmount time, XFS emits a warning for every in-core buffer that > > > > > might have undergone a write error. In practice this behavior is > > > > > probably reasonable given that the filesystem is likely short lived > > > > > once I/O errors begin to occur consistently. Under certain test or > > > > > otherwise expected error conditions, this can spam the logs and slow > > > > > down the unmount. > > > > > > > > > > We already have a ratelimit state defined for buffers failing > > > > > writeback. Fold this state into the buftarg and reuse it for the > > > > > unmount time errors. > > > > > > > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > > > > > > > Looks fine, but I suspect we both missed something here: > > > > xfs_buf_ioerror_alert() was made a ratelimited printk in the last > > > > cycle: > > > > > > > > void > > > > xfs_buf_ioerror_alert( > > > > struct xfs_buf *bp, > > > > xfs_failaddr_t func) > > > > { > > > > xfs_alert_ratelimited(bp->b_mount, > > > > "metadata I/O error in \"%pS\" at daddr 0x%llx len %d error %d", > > > > func, (uint64_t)XFS_BUF_ADDR(bp), bp->b_length, > > > > -bp->b_error); > > > > } > > > > > > > > > > Yeah, I hadn't noticed that. > > > > > > > Hence I think all these buffer error alerts can be brought under the > > > > same rate limiting variable. Something like this in xfs_message.c: > > > > > > > > > > One thing to note is that xfs_alert_ratelimited() ultimately uses > > > the DEFAULT_RATELIMIT_INTERVAL of 5s. The ratelimit we're generalizing > > > here uses 30s (both use a burst of 10). That seems reasonable enough to > > > me for I/O errors so I'm good with the changes below. > > > > > > FWIW, that also means we could just call xfs_buf_alert_ratelimited() > > > from xfs_buf_item_push() if we're also Ok with using an "alert" instead > > > of a "warn." I'm not immediately aware of a reason to use one over the > > > other (xfs_wait_buftarg() already uses alert) so I'll try that unless I > > > hear an objection. > > > > SOunds fine to me. > > > > > The xfs_wait_buftarg() ratelimit presumably remains > > > open coded because it's two separate calls and we probably don't want > > > them to individually count against the limit. > > > > That's why I suggested dropping the second "run xfs_repair" message > > and triggering a shutdown after the wait loop. That way we don't > > issue "run xfs_repair" for every single failed buffer (largely > > noise!), and we get a non-rate-limited common "run xfs-repair" > > message once we processed all the failed writes. > > > > Sorry, must have missed that in the last reply. I don't think we want to > shut down here because XBF_WRITE_FAIL only reflects that the internal > async write retry (e.g. the one historically used to mitigate transient > I/O errors) has occurred on the buffer, not necessarily that the > immediately previous I/O has failed. I think this is an incorrect reading of how XBF_WRITE_FAIL functions. XBF_WRITE_FAIL is used to indicate that the previous write failed, not that a historic write failed. The flag is cleared at buffer submission time - see xfs_buf_delwri_submit_buffers() and xfs_bwrite() - and so it is only set on buffers whose previous IO failed and hence is still dirty and has not been flushed back to disk. If we hit this in xfs_buftarg_wait() after we've pushed the AIL in xfs_log_quiesce() on unmount, then we've got write failures that could not be resolved by repeated retries, and the filesystem is, at this instant in time, inconsistent on disk. That's a shutdown error condition... > For that reason I've kind of looked > at this particular instance as more of a warning that I/O errors have > occurred in the past and the user might want to verify it didn't result > in unexpected damage. FWIW, I've observed plenty of these messages long > after I've disabled error injection and allowed I/O to proceed and the > fs to unmount cleanly. Right. THat's the whole point of the flag - the buffer has been dirtied and it hasn't been able to be written back when we are purging the buffer cache at the end of unmount. i.e. When xfs_buftarg_wait() is called, all buffers should be clean because we are about to write an unmount record to mark the log clean once all the logged metadata is written back. What we do right now - write an unmount record after failing metadta writeback - is actually a bug, and that is one of the reasons why I suggested a shutdown should be done. i.e. we should not be writing an unmount record to mark the log clean if we failed to write back metadata. That metadata is still valid in the journal, and so it should remain valid in the journal to allow it to be replayed on the next mount. i.e. retry the writes from log recovery after the hardware failure has been addressed and the IO errors have gone away. Tossing the dirty buffers unmount and then marking the journal clean is actually making the buffer write failure -worse-. Doing this guarantees the filesystem is inconsistent on disk (by updating the journal to indicate those writes actually succeeded) and absolutely requires xfs_repair to fix as a result. If we shut down on XBF_WRITE_FAIL buffers in xfs_buftarg_wait(), we will not write an unmount record and so give the filesystem a chance to recover on next mount (e.g. after a power cycle to clear whatever raid hardware bug was being hit) and write that dirty metadata back without failure. If recovery fails with IO errors, then the user really does need to run repair. However, the situation at this point is still better than if we write a clean unmount record after write failures... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx