Ram Pai <linuxram@xxxxxxxxxx> writes: > On Fri, Feb 03, 2017 at 11:54:21PM +1300, Eric W. Biederman wrote: >> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes: >> >> > Ram Pai <linuxram@xxxxxxxxxx> writes: >> > >> >> On Sat, Jan 21, 2017 at 05:15:29PM +1300, Eric W. Biederman wrote: >> >>> Ram Pai <linuxram@xxxxxxxxxx> writes: >> >>> >> >>> >> @@ -359,12 +373,24 @@ int propagate_mount_busy(struct mount *mnt, int refcnt) >> >>> >> >> >>> >> for (m = propagation_next(parent, parent); m; >> >>> >> m = propagation_next(m, parent)) { >> >>> >> - child = __lookup_mnt_last(&m->mnt, mnt->mnt_mountpoint); >> >>> >> - if (child && list_empty(&child->mnt_mounts) && >> >>> >> - (ret = do_refcount_check(child, 1))) >> >>> >> - break; >> >>> >> + int count = 1; >> >>> >> + child = __lookup_mnt(&m->mnt, mnt->mnt_mountpoint); >> >>> >> + if (!child) >> >>> >> + continue; >> >>> >> + >> >>> >> + /* Is there exactly one mount on the child that covers >> >>> >> + * it completely whose reference should be ignored? >> >>> >> + */ >> >>> >> + topper = find_topper(child); >> >>> > >> >>> > This is tricky. I understand it is trying to identify the case where a >> >>> > mount got tucked-in because of propagation. But this will not >> >>> > distinguish the case where a mount got over-mounted genuinely, not because of >> >>> > propagation, but because of explicit user action. >> >>> > >> >>> > >> >>> > example: >> >>> > >> >>> > case 1: (explicit user action) >> >>> > B is a slave of A >> >>> > mount something on A/a , it will propagate to B/a >> >>> > and than mount something on B/a >> >>> > >> >>> > case 2: (tucked mount) >> >>> > B is a slave of A >> >>> > mount something on B/a >> >>> > and than mount something on A/a >> >>> > >> >>> > Both case 1 and case 2 lead to the same mount configuration. >> >>> > >> >>> > >> >>> > however 'umount A/a' in case 1 should fail. >> >>> > and 'umount A/a' in case 2 should pass. >> >>> > >> >>> > Right? in other words, umounts of 'tucked mounts' should pass(case 2). >> >>> > whereas umounts of mounts on which overmounts exist should >> >>> > fail.(case 1) >> >>> >> >>> Looking at your example. I agree that case 1 will fail today. >> >> >> >> And should continue to fail. right? Your semantics change will pass it. >> > >> > I don't see why it should continue to fail. >> > >> >>> However my actual expectation would be for both mount configurations >> >>> to behave the same. In both cases something has been explicitly mounted >> >>> on B/a and something has propagated to B/a. In both cases the mount >> >>> on top is what was explicitly mounted, and the mount below is what was >> >>> propagated to B/a. >> >>> >> >>> I don't see why the order of operations should matter. >> >> >> >> One of the subtle expectation is reversibility. >> >> >> >> Mount followed immediately by unmount has always passed and that is the >> >> standard expectation always. Your proposed code will ensure that. >> >> >> >> However there is one other subtle expectaton. >> >> >> >> A mount cannot disappear if a user has explicitly mounted on top of it. >> >> >> >> your proposed code will not meet that expectation. >> >> >> >> In other words, these two expectations make it behave differently even >> >> when; arguably, they feel like the same configuration. >> > >> > I am not seeing that. >> > >> > >> > >> >>> >> >>> > maybe we need a flag to identify tucked mounts? >> >>> >> >>> To preserve our exact current semantics yes. >> >>> >> >>> The mount configurations that are delibearately constructed that I am >> >>> aware of are comparatively simple. I don't think anyone has even taken >> >>> advantage of the shadow/side mounts at this point. I made a reasonable >> >>> effort to find out and no one was even aware they existed. Much less >> >>> what they were. And certainly no one I talked to could find code that >> >>> used them. >> >> >> >> But someday; even if its after a decade, someone ;) will >> >> stumble into this semantics and wonder 'why?'. Its better to get it right >> >> sooner. Sorry, I am blaming myself; for keeping some of the problems >> >> open thinking no one will bump into them. >> > >> > Oh definitely. If we have people ready to talk it through I am happy to >> > dot as many i's and cross as many t's as we productively can. >> > >> > I was just pointing out that I don't have any reason to expect that any >> > one depends on the subtle details of the implementation today so we >> > still have some wiggle room to fix them. Even if they are visible to >> > user space. >> >> So I haven't seen a reply, and we are getting awfully close to the merge >> window. Is there anything concrete we can do to ease concerns? >> >> Right now I am thinking my last version of the patch is the likely the >> best we have time and energy to manage and it would be good to merge it >> before the code bit rots. > > I was waiting for some other opinions on the behavior, since I > continue to think that 'one should not be able to unmount mounts on > which a user has explicitly mounted upon'. I am happy to be overruled, > since your patch significantly improves the rest of the semantics. > > Viro? Ram Pai, just to be clear you were hoping to add the logic below to my patch? My objections to the snippet below are: - It makes it hard for the CRIU folks (yet more state they have to find and restore). - It feels subjectively worse to me. - We already have cases where mounts are unmounted transparently (umount on rmdir). - Al Viro claims that the side/shadow mounts are ordinary mounts and maintaining this extra logic that remembers if we tucked one mount under another seems to make this them less ordinary. - The symmetry for unmounting exists for a tucked mount. We can unmount it via propagation or we can unmount the mount above it, and then we can unmount the new underlying mount. So I don't see why we don't want symmetry in the other case just because we mounted on top of the mount and rather than had the mount tucked under us. diff --git a/fs/namespace.c b/fs/namespace.c index 8bfad42c1ccf..8b00e0548438 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2047,8 +2047,10 @@ static int attach_recursive_mnt(struct mount *source_mnt, hlist_del_init(&child->mnt_hash); q = __lookup_mnt(&child->mnt_parent->mnt, child->mnt_mountpoint); - if (q) + if (q) { mnt_change_mountpoint(child, smp, q); + child->mnt.mnt_flags |= MNT_TUCKED; + } commit_tree(child); } put_mountpoint(smp); diff --git a/fs/pnode.c b/fs/pnode.c index 5bc7896d122a..e2a6ac68feb9 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -327,6 +327,9 @@ static struct mount *find_topper(struct mount *mnt) /* If there is exactly one mount covering mnt completely return it. */ struct mount *child; + if (!(mnt->mnt.mnt_flags & MNT_TUCKED)) + return NULL; + if (!list_is_singular(&mnt->mnt_mounts)) return NULL; diff --git a/include/linux/mount.h b/include/linux/mount.h index 8e0352af06b7..25ca398b19b3 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -52,6 +52,7 @@ struct mnt_namespace; #define MNT_INTERNAL 0x4000 +#define MNT_TUCKED 0x020000 #define MNT_LOCK_ATIME 0x040000 #define MNT_LOCK_NOEXEC 0x080000 #define MNT_LOCK_NOSUID 0x100000 Eric