Re: [PATCH RFC v6 6/9] vfs: Add sb_want_write() function to get vfsmount from a given sb.

Qu Wenruo <quwenruo@xxxxxxxxxxxxxx> · Thu, 5 Feb 2015 08:32:25 +0800

-------- Original Message --------
Subject: Re: [PATCH RFC v6 6/9] vfs: Add sb_want_write() function to get 
vfsmount from a given sb.
From: Miao Xie <miaoxie@xxxxxxxxxx>
To: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>, <linux-btrfs@xxxxxxxxxxxxxxx>
Date: 2015年02月04日 16:09
On Wed, 04 Feb 2015 10:10:55 +0800, Qu Wenruo wrote:
*** Please DON'T merge this patch, it's only for disscusion purpose ***

There are sysfs interfaces in some fs, only btrfs yet, which will modify
on-disk data.
Unlike normal file operation routine we can use mnt_want_write_file() to
protect the operation, change through sysfs won't to be binded to any file
in the filesystem.

So introduce new sb_want_write() to do the protection agains a super
block, which acts much like mnt_want_write() but will return success if
the super block is read-write.

Since sysfs handler don't go through the normal vfsmount, so it won't
increase the refcount of and even we have sb_want_write() waiting sb to
be unfrozen, the fs can still be unmounted without problem.
Causing the modules unable to be removed and user can find out what's
wrong until

To solve such problem, we have different strategies to solve it.
1) Extra check on last instance umount of a sb
This is the method the patch uses.
This method seems valid enough, since we want to get write protection on
a sb, so it's OK for the sb if there is *ANY* mount instance.
Problem 1.1)
But lsof and other tools won't help if sb_want_write() on frozen fs cause
it unable to be unmounted.

Problem 1.2)
When get namespace involved, things will get more complicated.
Like the following case:
	Alice				|		Bob
Mount devA on /mnt1 in her ns		| Mount devA on /mnt2/ in his ns
freeze /mnt1				|
sb_want_write() (waiting)		|
umount /mnt1 (success since there is 	|
another mount instance)			|
					| umount /mnt2 (fail since there
					| is sb_want_write() waiting)

So Alice can't thaw the fs since there is no mount point for it now.

2) Don't allow any umount of the sb if there is sb_want_write().
More aggressive one, purpose by Miao Xie.
Can't resolve problem 1.1) but will solve problem 1.2).
This is one of the two methods that I told you, but not the one I recommended.
What I wanted to recommend is that thaw the fs at the beginning of the
sb kill process, and in sb_want_write(), we check if the sb is active or
not after we pass sb_start_write, if the sb is not active, go back.
(This way also is not so good, but better than the above one)

Although introduced new problem like the following:
	Alice
Mount devA on /mnt1
freeze /mnt1
sb_want_write() (waiting)
mount devA on /mnt2 and /mnt3

/mnt[123] all can't be unmounted, but new mount can still be created.

3) sb_want_write() doesn't make any sense and break VFS rules!
Action which will change on-disk data should not be tunable through sysfs,
and sb_want_write() things which by-pass all the VFS check is just evil.
And for btrfs, we already have the ioctl to set label, why bothering new
sysfs interface to do it again?

Although I use method 1) to do it, I am still not certain about which is
method is the correct one.

So any advise is welcomed.

Thanks,
Qu
[SNIP]

+/**
+ * sb_want_write - get write acess to a super block
+ * @sb: the superblock of the filesystem
+ *
+ * This tells the low-level filesystem that a write is about to be performed to
+ * it, and makes sure that the writes are allowed (superblock is read-write,
+ * filesystem is not frozen) before returning success.
+ * When the write operation is finished, sb_drop_write() must be called.
+ * This is much like mnt_want_write() as a refcount, but only needs
+ * the superblock to be read-write.
+ */
+int sb_want_write(struct super_block *sb)
+{
+	spin_lock(&sb->s_want_write_lock);
+	if (sb->s_want_write_block) {
+		spin_unlock(&sb->s_want_write_lock);
+		return -EBUSY;
+	}
+	sb->s_want_write_count++;
+	spin_unlock(&sb->s_want_write_lock);
Also, such behavior is the same as  rw_sem, so I'll also change it to 
rw_sem to use the existing
infrastructure in next version.

Thanks,
Qu
+
+	sb_start_write(sb);
+	if (sb->s_readonly_remount || sb->s_flags & MS_RDONLY) {
If someone remount the fs to R/O here(after the check), we should not continue
to change label/features. I think we need add some check in remount functions.

Thanks
Miao

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html