On Thu, Aug 10, 2023 at 8:05 PM Jan Kara wrote: > > On Fri 04-08-23 11:01:39, Ryusuke Konishi wrote: > > On Thu, Aug 3, 2023 at 8:46 PM Jan Kara wrote: > > > > > > On Wed 02-08-23 17:41:21, Christoph Hellwig wrote: > > > > Use the generic setup_bdev_super helper to open the main block device > > > > and do various bits of superblock setup instead of duplicating the > > > > logic. This includes moving to the new scheme implemented in common > > > > code that only opens the block device after the superblock has allocated. > > > > > > > > It does not yet convert nilfs2 to the new mount API, but doing so will > > > > become a bit simpler after this first step. > > > > > > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > > > > > AFAICS nilfs2 could *almost* use mount_bdev() directly and then just do its > > > > > snapshot thing after mount_bdev() returns. But it has this weird logic > > > that: "if the superblock is already mounted but we can shrink the whole > > > dcache, then do remount instead of ignoring mount options". Firstly, this > > > looks racy - what prevents someone from say opening a file on the sb just > > > after nilfs_tree_is_busy() shrinks dcache? Secondly, it is inconsistent > > > with any other filesystem so it's going to surprise sysadmins not > > > intimately knowing nilfs2. Thirdly, from userspace you cannot tell what > > > your mount call is going to do. Last but not least, what is it really good > > > for? Ryusuke, can you explain please? > > > > > > Honza > > > > I think you are referring to the following part: > > > > > if (!s->s_root) { > > ... > > > } else if (!sd.cno) { > > > if (nilfs_tree_is_busy(s->s_root)) { > > > if ((flags ^ s->s_flags) & SB_RDONLY) { > > > nilfs_err(s, > > > "the device already has a %s mount.", > > > sb_rdonly(s) ? "read-only" : "read/write"); > > > err = -EBUSY; > > > goto failed_super; > > > } > > > } else { > > > /* > > > * Try remount to setup mount states if the current > > > * tree is not mounted and only snapshots use this sb. > > > */ > > > err = nilfs_remount(s, &flags, data); > > > if (err) > > > goto failed_super; > > > } > > > } > > > > What this logic is trying to do is, if there is already a nilfs2 mount > > instance for the device, and are trying to mounting the current tree > > (sd.cno is 0, so this is not a snapshot mount), then will switch > > depending on whether the current tree has a mount: > > > > - If the current tree is mounted, it's just like a normal filesystem. > > (A read-only mount and a read/write mount can't coexist, so check > > that, and reuse the instance if possible) > > - Otherwise, i.e. for snapshot mounts only, do whatever is necessary > > to add a new current mount, such as starting a log writer. > > Since it does the same thing that nilfs_remount does, so > > nilfs_remount() is used there. > > > > Whether or not there is a current tree mount can be determined by > > d_count(s->s_root) > 1 as nilfs_tree_is_busy() does. > > Where s->s_root is always the root dentry of the current tree, not > > that of the mounted snapshot. > > I see now, thanks for explanation! But one thing still is not clear to me. > If you say have a snapshot mounted read-write and then you mount the > current snapshot (cno == 0) read-only, you'll switch the whole superblock > to read-only state. So also the mounted snapshot is suddently read-only > which is unexpected and actually supposedly breaks things because you can > still have file handles open for writing on the snapshot etc.. So how do > you solve that? > > Honza One thing I have to tell you as a premise is that nilfs2's snapshot mounts (cno != 0) are read-only. The read-only option is mandatory for nilfs2 snapshot mounts, so remounting to read/write mode will result in an error. This constraint is checked in nilfs_parse_snapshot_option() which is called from nilfs_identify(). In fact, any write mode file/inode operations on a snapshot mount will result in an EROFS error, regardless of whether the coexisting current tree mount is read-only or read/write (i.e. regardless of the read-only flag of the superblock instance). This is mostly (and possibly entirely) accomplished at the vfs layer by checking the MNT_READONLY flag in mnt_flags of the vfsmount structure, and even on the nilfs2 side, iops->permission (=nilfs_permission) rejects write operations on snapshot mounts. Therefore, the problem you pointed out shouldn't occur in the first place since the situation where a snapshot with a handle in write mode suddenly becomes read-only doesn't happen. Unless I'm missing something.. Regards, Ryusuke Konishi