Re: [PATCH RFC 0/13] fs: generic filesystem shutdown handling

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Thu, 8 Aug 2024 07:51:41 -0700

On Thu, Aug 08, 2024 at 09:18:51AM +1000, Dave Chinner wrote:
> On Wed, Aug 07, 2024 at 08:29:45PM +0200, Jan Kara wrote:
> > Hello,
> > 
> > this patch series implements generic handling of filesystem shutdown. The idea
> > is very simple: Have a superblock flag, which when set, will make VFS refuse
> > modifications to the filesystem. The patch series consists of several parts.
> > Patches 1-6 cleanup handling of SB_I_ flags which is currently messy (different
> > flags seem to have different locks protecting them although they are modified
> > by plain stores). Patches 7-12 gradually convert code to be able to handle
> > errors from sb_start_write() / sb_start_pagefault(). Patch 13 then shows how
> > filesystems can use this generic flag. Additionally, we could remove some
> > shutdown checks from within ext4 code and rely on checks in VFS but I didn't
> > want to complicate the series with ext4 specific things.
> 
> Overall this looks good. Two things that I noticed that we should
> nail down before anything else:
> 
> 1. The original definition of a 'shutdown filesystem' (i.e. from the
> XFS origins) is that a shutdown filesystem must *never* do -physical
> IO- after the shutdown is initiated. This is a protection mechanism
> for the underlying storage to prevent potential propagation of
> problems in the storage media once a serious issue has been
> detected. (e.g. suspect physical media can be made worse by
> continually trying to read it.) It also allows the block device to
> go away and we won't try to access issue new IO to it once the
> ->shutdown call has been complete.
> 
> IOWs, XFS implements a "no new IO after shutdown" architecture, and
> this is also largely what ext4 implements as well.

I don't think it quite does -- for EXT4_GOING_FLAGS_DEFAULT, it sets the
shutdown flag, but it doesn't actually abort the journal.  I think
that's an implementation bug since XFS /does/ shut down the log.

But looking at XFS_FSOP_GOING_FLAGS_DEFAULT, I also notice that if the
bdev_freeze fails, it returns 0 and the fs isn't shut down.  ext4, otoh,
actually does pass bdev_freeze's errno along.  I think ext4's behavior
is the correct one, right?

> However, this isn't what this generic shutdown infrastructure
> implements. It only prevents new user modifications from being
> started - it is effectively a "instant RO" mechanism rather than an
> "instant no more IO" architecture.

I thought pagefaults are still allowed on a shutdown xfs?  Curiously I
don't see a prohibition on write faults, but iirc we still allowed read
faults so that a shutdown on the rootfs doesn't immediately crash the
whole machine?

> Hence we have an impedence mismatch between existing shutdown
> implementations that currently return -EIO on shutdown for all
> operations (both read and write) and this generic implementation
> which returns -EROFS only for write operations.
> 
> Hence the proposed generic shutdown model doesn't really solve the
> inconsistent shutdown behaviour problem across filesystems - it just
> adds a new inconsistency between existing filesystem shutdown
> implementations and the generic infrastructure.
> 
> 2. On shutdown, this patchset returns -EROFS.
> 
> As per #1, returning -EROFS on shutdown will be a significant change
> of behaviour for some filesystems as they currently return -EIO when
> the filesystem is shut down.
> 
> I don't think -EROFS is right, because existing shutdown behaviour
> also impacts read-only operations and will return -EIO for them,
> too.
> 
> I think the error returned by a shutdown filesystem should always be
> consistent and that really means -EIO needs to be returned rather
> than -EROFS.
> 
> However, given this is new generic infrastructure, we can define a
> new error like -ESHUTDOWN (to reuse an existing errno) or even a
> new errno like -EFSSHUTDOWN for this, document it man pages and then
> convert all the existing filesystem shutdown checks to return this
> error instead of -EIO...

Agree.

> > Also, as Dave suggested, we can lift *_IOC_{SHUTDOWN|GOINGDOWN} ioctl handling
> > to VFS (currently in 5 filesystems) and just call new ->shutdown op for
> > the filesystem abort handling itself. But that is kind of independent thing
> > and this series is long enough as is.
> 
> Agreed - that can be done separately once we've sorted out the
> little details of what a shutdown filesystem actually means and how
> that gets reported consistently to userspace...

I would define it as:

No more writes to the filesystem or its underlying storage; file IO
and stat* calls return ESHUTDOWN; read faults still allowed.

I like the idea of hoisting this to the vfs and defining how one figures
out if the fs is shut down; thank you for working on this, Jan.

--D

> -Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>