On Thu, Aug 08, 2024 at 09:18:51AM +1000, Dave Chinner wrote: > On Wed, Aug 07, 2024 at 08:29:45PM +0200, Jan Kara wrote: > > Hello, > > > > this patch series implements generic handling of filesystem shutdown. The idea > > is very simple: Have a superblock flag, which when set, will make VFS refuse > > modifications to the filesystem. The patch series consists of several parts. > > Patches 1-6 cleanup handling of SB_I_ flags which is currently messy (different > > flags seem to have different locks protecting them although they are modified > > by plain stores). Patches 7-12 gradually convert code to be able to handle > > errors from sb_start_write() / sb_start_pagefault(). Patch 13 then shows how > > filesystems can use this generic flag. Additionally, we could remove some > > shutdown checks from within ext4 code and rely on checks in VFS but I didn't > > want to complicate the series with ext4 specific things. > > Overall this looks good. Two things that I noticed that we should > nail down before anything else: > > 1. The original definition of a 'shutdown filesystem' (i.e. from the > XFS origins) is that a shutdown filesystem must *never* do -physical > IO- after the shutdown is initiated. This is a protection mechanism > for the underlying storage to prevent potential propagation of > problems in the storage media once a serious issue has been > detected. (e.g. suspect physical media can be made worse by > continually trying to read it.) It also allows the block device to > go away and we won't try to access issue new IO to it once the > ->shutdown call has been complete. > > IOWs, XFS implements a "no new IO after shutdown" architecture, and > this is also largely what ext4 implements as well. I don't think it quite does -- for EXT4_GOING_FLAGS_DEFAULT, it sets the shutdown flag, but it doesn't actually abort the journal. I think that's an implementation bug since XFS /does/ shut down the log. But looking at XFS_FSOP_GOING_FLAGS_DEFAULT, I also notice that if the bdev_freeze fails, it returns 0 and the fs isn't shut down. ext4, otoh, actually does pass bdev_freeze's errno along. I think ext4's behavior is the correct one, right? > However, this isn't what this generic shutdown infrastructure > implements. It only prevents new user modifications from being > started - it is effectively a "instant RO" mechanism rather than an > "instant no more IO" architecture. I thought pagefaults are still allowed on a shutdown xfs? Curiously I don't see a prohibition on write faults, but iirc we still allowed read faults so that a shutdown on the rootfs doesn't immediately crash the whole machine? > Hence we have an impedence mismatch between existing shutdown > implementations that currently return -EIO on shutdown for all > operations (both read and write) and this generic implementation > which returns -EROFS only for write operations. > > Hence the proposed generic shutdown model doesn't really solve the > inconsistent shutdown behaviour problem across filesystems - it just > adds a new inconsistency between existing filesystem shutdown > implementations and the generic infrastructure. > > 2. On shutdown, this patchset returns -EROFS. > > As per #1, returning -EROFS on shutdown will be a significant change > of behaviour for some filesystems as they currently return -EIO when > the filesystem is shut down. > > I don't think -EROFS is right, because existing shutdown behaviour > also impacts read-only operations and will return -EIO for them, > too. > > I think the error returned by a shutdown filesystem should always be > consistent and that really means -EIO needs to be returned rather > than -EROFS. > > However, given this is new generic infrastructure, we can define a > new error like -ESHUTDOWN (to reuse an existing errno) or even a > new errno like -EFSSHUTDOWN for this, document it man pages and then > convert all the existing filesystem shutdown checks to return this > error instead of -EIO... Agree. > > Also, as Dave suggested, we can lift *_IOC_{SHUTDOWN|GOINGDOWN} ioctl handling > > to VFS (currently in 5 filesystems) and just call new ->shutdown op for > > the filesystem abort handling itself. But that is kind of independent thing > > and this series is long enough as is. > > Agreed - that can be done separately once we've sorted out the > little details of what a shutdown filesystem actually means and how > that gets reported consistently to userspace... I would define it as: No more writes to the filesystem or its underlying storage; file IO and stat* calls return ESHUTDOWN; read faults still allowed. I like the idea of hoisting this to the vfs and defining how one figures out if the fs is shut down; thank you for working on this, Jan. --D > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >