Re: mnt_list corruption triggered during btrfs/326

Christian Brauner <brauner@xxxxxxxxxx> · Thu, 9 Jan 2025 13:51:56 +0100

On Tue, Jan 07, 2025 at 04:00:34PM +0100, Christian Brauner wrote:
> > > Can you please try and reproduce this with
> > > commit 211364bef4301838b2e1 ("fs: kill MNT_ONRB")
> > 
> > This patch should indirectly address both errors but it does not
> > explain why the flag is sometimes missing.
> 
> Yeah, I'm well aware that's why I didn't fast-track it.
> I just didn't have the time to think about this yet.

I think I know how it happens.

btrfs_get_tree_subvol()
{
	mnt = fc_mount()
	// Register the newly allocated mount with sb->mounts:
	lock_mount_hash();
	list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
	unlock_mount_hash();
}

So now it's public on sb->s_mounts.

Concurrently someone does a ro remount:

reconfigure_super()
-> sb_prepare_remount_readonly()
   {
           list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
   }

This walks all mounts registered in sb->s_mounts and raises
MNT_WRITE_HOLD, then raise MNT_READONLY, and then removes
MNT_WRITE_HOLD.

This can happen concurrently with mount_subvol() because sb->s_umount
isn't held anymore:

-> mount_subvol()
   -> mount_subtree()
      -> alloc_mnt_ns()
         mnt_add_to_ns()
	 vfs_path_lookup()
	 put_mnt_ns()

The flag modification of mnt_add_to_ns() races the flag modification of
the read-only remount. So MNT_ONRB might be lost...

If that's correct, then a) we know how this happens and b) that killing
MNT_ONRB is the correct fix for this.