Re: [PATCH] fs: Provide helpers for manipulating sb->s_readonly_remount

Christian Brauner <brauner@xxxxxxxxxx> · Sat, 17 Jun 2023 17:05:25 +0200

On Sat, Jun 17, 2023 at 09:33:42AM +1000, Dave Chinner wrote:
> On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote:
> > Provide helpers to set and clear sb->s_readonly_remount including
> > appropriate memory barriers. Also use this opportunity to document what
> > the barriers pair with and why they are needed.
> > 
> > Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx>
> > Signed-off-by: Jan Kara <jack@xxxxxxx>
> 
> The helper conversion looks fine so from that perspective the patch
> looks good.
> 
> However, I'm not sure the use of memory barriers is correct, though.
> 
> IIUC, we want mnt_is_readonly() to return true when ever
> s_readonly_remount is set. Is that the behaviour we are trying to
> acheive for both ro->rw and rw->ro transactions?
> 
> > ---
> >  fs/internal.h      | 26 ++++++++++++++++++++++++++
> >  fs/namespace.c     | 10 ++++------
> >  fs/super.c         | 17 ++++++-----------
> >  include/linux/fs.h |  2 +-
> >  4 files changed, 37 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/internal.h b/fs/internal.h
> > index bd3b2810a36b..01bff3f6db79 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb);
> >  extern bool mount_capable(struct fs_context *);
> >  int sb_init_dio_done_wq(struct super_block *sb);
> >  
> > +/*
> > + * Prepare superblock for changing its read-only state (i.e., either remount
> > + * read-write superblock read-only or vice versa). After this function returns
> > + * mnt_is_readonly() will return true for any mount of the superblock if its
> > + * caller is able to observe any changes done by the remount. This holds until
> > + * sb_end_ro_state_change() is called.
> > + */
> > +static inline void sb_start_ro_state_change(struct super_block *sb)
> > +{
> > +	WRITE_ONCE(sb->s_readonly_remount, 1);
> > +	/* The barrier pairs with the barrier in mnt_is_readonly() */
> > +	smp_wmb();
> > +}
> 
> I'm not sure how this wmb pairs with the memory barrier in
> mnt_is_readonly() to provide the correct behavior. The barrier in
> mnt_is_readonly() happens after it checks s_readonly_remount, so
> the s_readonly_remount in mnt_is_readonly is not ordered in any way
> against this barrier.
> 
> The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY
> and MNT_READONLY are ordered after s_readonly_remount(), but we
> don't change those flags until a long way after s_readonly_remount
> is set.
> 
> Hence if this is a ro->rw transistion, then I can see that racing on
> s_readonly_remount being isn't an issue, because the mount/sb
> flags will have SB_RDONLY/MNT_READONLY set and the correct thing
> will be done (i.e. consider code between sb_start_ro_state_change()
> and sb_end_ro_state_change() is RO).
> 
> However, it's not obvious (to me, anyway) how this works at all for
> a rw->ro transition - if we race on s_readonly_remount being set
> then we'll consider the fs to still be read-write regardless of the
> smp_rmb() in mnt_is_readonly() because neither SB_RDONLY or
> MNT_READONLY are set at this point.

Let me try and remember it all. I've documented a good portion of this
in the relevant functions but I should probably upstream some more
longer documentation blurb as well.

A rw->ro transition happen in two ways.

(1) A mount or mount tree is made read-only via
    mount_setattr(MNT_ATTR_READONLY) or
    mount(MS_BIND|MS_RDONLY|MS_REMOUNT).
(2) The filesystems/superblock is made read-only via fspick()+fsconfig()
    or mount(MS_REMOUNT|MS_RDONLY).

For both (1) and (2) we grab lock_mount_hash() in relevant codepaths
(because that's required for any vfsmount->mnt_flags changes) and then
call mnt_hold_writers().

mnt_hold_writers() will first raise MNT_WRITE_HOLD in @mnt->mnt_flags
before checking the write counter of that mount to see whether there are
any active writers on that mount. If there are any active writers we'll
fail mnt_hold_writers() and the whole rw->ro transition.

A memory barrier is used to order raising MNT_WRITE_HOLD against the
increment of the write counter of that mount in __mnt_want_write().
If __mnt_want_write() detects that MNT_WRITE_HOLD has been set after
it incremented the write counter it will spin until MNT_WRITE_HOLD is
cleared via mnt_unhold_writers(). This uses another memory barrier to
ensure ordering with the mnt_is_readonly() check in __mnt_want_write().

__mnt_want_write() doesn't know about the ro/rw state of the mount at
all until MNT_WRITE_HOLD has cleared. Then it calls mnt_is_readonly().

If the mount did indeed transition from rw->ro after MNT_WRITE_HOLD was
cleared __mnt_want_write() will back off. If not write access to the
mount is granted.

A superblock rw->ro transition is done the same way. It also requires
mnt_hold_writers() to be done. This is done in
sb_prepare_remount_readonly() which is called in reconfigure_super().

Only after mnt_hold_writers() has been raised successfully on every
mount of that filesystem (i.e., all bind mounts) will
sb->s_readonly_remount be set. After MNT_WRITE_HOLD is cleared and
mnt_is_readonly() is called sb->s_readonly_remount is guaranteed to be
visible or MNT_READONLY or SB_RDONLY are visible. The memory barrier in
sb->s_readonly_remount orders it against reading sb->s_flags. It doesn't
protect/order the rw->ro transition itself.

(The only exception is an emergency read-only remount where we don't
know what state the fs is in and don't care for any active writers on
that superblock so omit wading through all the mounts of that
filesystem. But that's only doable from withing the kernel via
SB_FORCE.)

Provided I understand your question/concern correctly.