On Thu, Aug 08, 2024 at 07:51:41AM -0700, Darrick J. Wong wrote: > On Thu, Aug 08, 2024 at 09:18:51AM +1000, Dave Chinner wrote: > > On Wed, Aug 07, 2024 at 08:29:45PM +0200, Jan Kara wrote: > > > Hello, > > > > > > this patch series implements generic handling of filesystem shutdown. The idea > > > is very simple: Have a superblock flag, which when set, will make VFS refuse > > > modifications to the filesystem. The patch series consists of several parts. > > > Patches 1-6 cleanup handling of SB_I_ flags which is currently messy (different > > > flags seem to have different locks protecting them although they are modified > > > by plain stores). Patches 7-12 gradually convert code to be able to handle > > > errors from sb_start_write() / sb_start_pagefault(). Patch 13 then shows how > > > filesystems can use this generic flag. Additionally, we could remove some > > > shutdown checks from within ext4 code and rely on checks in VFS but I didn't > > > want to complicate the series with ext4 specific things. > > > > Overall this looks good. Two things that I noticed that we should > > nail down before anything else: > > > > 1. The original definition of a 'shutdown filesystem' (i.e. from the > > XFS origins) is that a shutdown filesystem must *never* do -physical > > IO- after the shutdown is initiated. This is a protection mechanism > > for the underlying storage to prevent potential propagation of > > problems in the storage media once a serious issue has been > > detected. (e.g. suspect physical media can be made worse by > > continually trying to read it.) It also allows the block device to > > go away and we won't try to access issue new IO to it once the > > ->shutdown call has been complete. > > > > IOWs, XFS implements a "no new IO after shutdown" architecture, and > > this is also largely what ext4 implements as well. > > I don't think it quite does -- for EXT4_GOING_FLAGS_DEFAULT, it sets the > shutdown flag, but it doesn't actually abort the journal. I think > that's an implementation bug since XFS /does/ shut down the log. > > But looking at XFS_FSOP_GOING_FLAGS_DEFAULT, I also notice that if the > bdev_freeze fails, it returns 0 and the fs isn't shut down. ext4, otoh, > actually does pass bdev_freeze's errno along. I think ext4's behavior > is the correct one, right? Yes, there are inconsistencies in how different filesystems implement user-driven shutdown operations, but Jan has specifically left addressing those sorts of inconsistencies in ioctl/->shutdown implementations for a later patch set. I agree with that approach - let's first focus on defining a generic model for how shutdown filesystems should behave once they are shut down. Once we have the model defined, then we can worry about making filesystems shutdown mechanisms behave consistently within that model.. > > However, this isn't what this generic shutdown infrastructure > > implements. It only prevents new user modifications from being > > started - it is effectively a "instant RO" mechanism rather than an > > "instant no more IO" architecture. > > I thought pagefaults are still allowed on a shutdown xfs? Curiously I > don't see a prohibition on write faults, but iirc we still allowed read > faults so that a shutdown on the rootfs doesn't immediately crash the > whole machine? Yes, page faults are allowed on a shutdown XFS filesystem right up to the point where they need to do IO on a page cache miss. Then the IO request hits the block mapping code (xfs_bmapi_read()), sees the shutdown state and the read IO fails. The result of this is SIGBUS for your executable. IOWs, if the executable is cached, it will keep running after a shutdown. If it's not cached, then it's game over already. > > > Also, as Dave suggested, we can lift *_IOC_{SHUTDOWN|GOINGDOWN} ioctl handling > > > to VFS (currently in 5 filesystems) and just call new ->shutdown op for > > > the filesystem abort handling itself. But that is kind of independent thing > > > and this series is long enough as is. > > > > Agreed - that can be done separately once we've sorted out the > > little details of what a shutdown filesystem actually means and how > > that gets reported consistently to userspace... > > I would define it as: > > No more writes to the filesystem or its underlying storage; file IO > and stat* calls return ESHUTDOWN; read faults still allowed. We need to operations in terms of whether we allow physical IO or not. We currently don't allow physical IO from read faults on XFs, so... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx