On Wed, May 06, 2020 at 08:36:43AM +1000, Dave Chinner wrote: > On Tue, May 05, 2020 at 07:58:10AM -0400, Brian Foster wrote: > > On Tue, May 05, 2020 at 07:53:07AM +1000, Dave Chinner wrote: > > > On Mon, May 04, 2020 at 10:11:37AM -0400, Brian Foster wrote: > > > > Hi all, > > > > > > > > I think everything has been reviewed to this point. Only minor changes > > > > noted below in this release. A git repo is available here[1]. > > > > > > > > The only outstanding feedback that I'm aware of is Dave's comment on > > > > patch 7 of v3 [2] regarding the shutdown assert check. I'm not aware of > > > > any means to get through xfs_wait_buftarg() with a dirty buffer that > > > > hasn't undergone the permanant error sequence and thus shut down the fs. > > > > > > # echo 0 > /sys/fs/xfs/<dev>/fail_at_unmount > > > > > > And now any error with a "retry forever" config (the default) will > > > be collected by xfs_buftarg_wait() without a preceeding shutdown as > > > xfs_buf_iodone_callback_error() will not treat it as a permanent > > > error during unmount. i.e. this doesn't trigger: > > > > > > /* At unmount we may treat errors differently */ > > > if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount) > > > goto permanent_error; > > > > > > and so the error handling just marks it with a write error and lets > > > it go for a write retry in future. These are then collected in > > > xfs_buftarg_wait() as nothing is going to retry them once unmount > > > gets to this point... > > > > > > > That doesn't accurately describe the behavior of that configuration, > > though. "Retry forever" means that dirty buffers are going to cycle > > through submission retries and the unmount is going to hang indefinitely > > (on pushing the AIL). Indeed, preventing this unmount hang is the > > original purpose of the fail at unmount knob (commit here[1]). > > So this patch is relying on something else hanging before we get to > xfs_wait_buftarg and hence "we don't have to handle that here". > Yes, though it's the existing code that relies on the external push to block until everything is written back or a permanent error is issued. This patch just splits up and rate limits a warning message. I added the assert because of the previous discussion around making sure this situation is handled properly and because that error processing dependency is not obvious. > Please document that assumption along with the ASSERT, because if we > change AIL behaviour to prevent retry-forever hangs with freeze > and/or remount-ro we're going to hit this assert... > Sure, I'll add a comment.. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >