On Wed, 04 Feb 2015 10:10:55 +0800, Qu Wenruo wrote: > *** Please DON'T merge this patch, it's only for disscusion purpose *** > > There are sysfs interfaces in some fs, only btrfs yet, which will modify > on-disk data. > Unlike normal file operation routine we can use mnt_want_write_file() to > protect the operation, change through sysfs won't to be binded to any file > in the filesystem. > > So introduce new sb_want_write() to do the protection agains a super > block, which acts much like mnt_want_write() but will return success if > the super block is read-write. > > Since sysfs handler don't go through the normal vfsmount, so it won't > increase the refcount of and even we have sb_want_write() waiting sb to > be unfrozen, the fs can still be unmounted without problem. > Causing the modules unable to be removed and user can find out what's > wrong until > > To solve such problem, we have different strategies to solve it. > 1) Extra check on last instance umount of a sb > This is the method the patch uses. > This method seems valid enough, since we want to get write protection on > a sb, so it's OK for the sb if there is *ANY* mount instance. > Problem 1.1) > But lsof and other tools won't help if sb_want_write() on frozen fs cause > it unable to be unmounted. > > Problem 1.2) > When get namespace involved, things will get more complicated. > Like the following case: > Alice | Bob > Mount devA on /mnt1 in her ns | Mount devA on /mnt2/ in his ns > freeze /mnt1 | > sb_want_write() (waiting) | > umount /mnt1 (success since there is | > another mount instance) | > | umount /mnt2 (fail since there > | is sb_want_write() waiting) > > So Alice can't thaw the fs since there is no mount point for it now. > > 2) Don't allow any umount of the sb if there is sb_want_write(). > More aggressive one, purpose by Miao Xie. > Can't resolve problem 1.1) but will solve problem 1.2). This is one of the two methods that I told you, but not the one I recommended. What I wanted to recommend is that thaw the fs at the beginning of the sb kill process, and in sb_want_write(), we check if the sb is active or not after we pass sb_start_write, if the sb is not active, go back. (This way also is not so good, but better than the above one) > Although introduced new problem like the following: > Alice > Mount devA on /mnt1 > freeze /mnt1 > sb_want_write() (waiting) > mount devA on /mnt2 and /mnt3 > > /mnt[123] all can't be unmounted, but new mount can still be created. > > 3) sb_want_write() doesn't make any sense and break VFS rules! > Action which will change on-disk data should not be tunable through sysfs, > and sb_want_write() things which by-pass all the VFS check is just evil. > And for btrfs, we already have the ioctl to set label, why bothering new > sysfs interface to do it again? > > Although I use method 1) to do it, I am still not certain about which is > method is the correct one. > > So any advise is welcomed. > > Thanks, > Qu [SNIP] > +/** > + * sb_want_write - get write acess to a super block > + * @sb: the superblock of the filesystem > + * > + * This tells the low-level filesystem that a write is about to be performed to > + * it, and makes sure that the writes are allowed (superblock is read-write, > + * filesystem is not frozen) before returning success. > + * When the write operation is finished, sb_drop_write() must be called. > + * This is much like mnt_want_write() as a refcount, but only needs > + * the superblock to be read-write. > + */ > +int sb_want_write(struct super_block *sb) > +{ > + spin_lock(&sb->s_want_write_lock); > + if (sb->s_want_write_block) { > + spin_unlock(&sb->s_want_write_lock); > + return -EBUSY; > + } > + sb->s_want_write_count++; > + spin_unlock(&sb->s_want_write_lock); > + > + sb_start_write(sb); > + if (sb->s_readonly_remount || sb->s_flags & MS_RDONLY) { If someone remount the fs to R/O here(after the check), we should not continue to change label/features. I think we need add some check in remount functions. Thanks Miao -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html