[PATCH] fs: unset MNT_WRITE_HOLD on failure

Christian Brauner <brauner@xxxxxxxxxx> · Wed, 20 Apr 2022 15:19:25 +0200

After mnt_hold_writers() has been called we will always have set MNT_WRITE_HOLD
and consequently we always need to pair mnt_hold_writers() with
mnt_unhold_writers(). After the recent cleanup in [1] where Al switched from a
do-while to a for loop the cleanup currently fails to unset MNT_WRITE_HOLD for
the first mount that was changed. Fix this and make sure that the first mount
will be cleaned up and add some comments to make it more obvious.

Reported-by: syzbot+10a16d1c43580983f6a2@xxxxxxxxxxxxxxxxxxxxxxxxx
Reported-by: syzbot+306090cfa3294f0bbfb3@xxxxxxxxxxxxxxxxxxxxxxxxx
Fixes: e257039f0fc7 ("mount_setattr(): clean the control flow and calling conventions") [1]
Link: https://lore.kernel.org/lkml/0000000000007cc21d05dd0432b8@xxxxxxxxxx
Link: https://lore.kernel.org/lkml/00000000000080e10e05dd043247@xxxxxxxxxx
Cc: Hillf Danton <hdanton@xxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Christian Brauner (Microsoft) <brauner@xxxxxxxxxx>
---
This should fix the syzbot issue. This is only relevant for making a
mount or mount tree read-only:
1. successul recursive read-only mount tree change:
   Cleanup loop isn't executed.
2. failed recursive read-only mount tree change:
   m will point to the mount we failed to handle. The cleanup loop will
   run until p == m and then terminate.
3. successful single read-only mount change:
   Cleanup loop won't be executed.
4. failed single read-only mount change:
   m will point to mnt and the cleanup loop will terminate if p == m.
I don't think there's any other weird corner cases since we now that
MNT_WRITE_HOLD can only have been set by us as it requires
lock_mount_hash() which we hold. So unconditionally unsetting it is
fine. But please make sure to take a close look at the changed loop.
---
 fs/namespace.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a0a36bfa3aa0..afe2b64b14f1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4058,10 +4058,22 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 	if (err) {
 		struct mount *p;
 
-		for (p = mnt; p != m; p = next_mnt(p, mnt)) {
+		/*
+		 * If we had to call mnt_hold_writers() MNT_WRITE_HOLD will
+		 * be set in @mnt_flags. The loop unsets MNT_WRITE_HOLD for all
+		 * mounts and needs to take care to include the first mount.
+		 */
+		for (p = mnt; p; p = next_mnt(p, mnt)) {
 			/* If we had to hold writers unblock them. */
 			if (p->mnt.mnt_flags & MNT_WRITE_HOLD)
 				mnt_unhold_writers(p);
+
+			/*
+			 * We're done once the first mount we changed got
+			 * MNT_WRITE_HOLD unset.
+			 */
+			if (p == m)
+				break;
 		}
 	}
 	return err;

base-commit: b2d229d4ddb17db541098b83524d901257e93845
-- 
2.32.0