Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > On Tue, Aug 12, 2014 at 03:17:10AM -0700, Eric W. Biederman wrote: >> I have rebased my changes against vfs.git#for-eric and my changes work >> just fine on top of the base you have built. The changes are avaiable >> in user-namespace.git#vfs-detach-mounts10 so you just be able to just >> pull the changes in. >> >> Reading your pile #1 pull request to Linus it sounds like you are >> planning to suck all of this into the vfs tree. > > I am. Questions: > * is there any reason why we need list instead of hlist for > per-mountpoint list of mounts? Looks like hlist would do just as > well, and it's a bit less noise I can't think of a reason we can't use a hlist_head in for m_list in struct mountpoint. Hmm. I do use list_add_tail in mnt_set_mountpoint but other than good hygiene that does not seem for any particular reason because __detach_mounts processes all of the list entries. So going from newest to oldest or oldest to newest shouldn't matter. We do need it to be a doubly linked list so umount_tree and detach_mount remain constant time operations. But both list and hlist satisfy that requirement. So as far as I can tell the only reason I used a list is because every list through the mounts at the time I added the code was a list and later it has not been important enough to matter. > * __d_unalias() change looks rather odd. What we do there > is _not_ "avoid leaking mounts", it's "don't get a bunch of existing > mounts suddenly relocate". What's up with that one? The change to __d_unalias is semantically the same as the change vfs_rename with the d_mountpoint tests going away. The user space visible behavior change is to allow rename in one mount namespace even if there is a mount on that directory in another mount namespace. In the case of __d_unalias the mount namespace the rename happened in is on a remote computer or otherwise not part of the vfs. Allowing renames and unlinks in one mount namespace even when there is a mount point in another mount namespace is important to avoid breaking unix semantics and to prevent unprivileged users from causing unlink or rename of more privileged users in other mount namespaces to fail by placing mounts on the more privileged users files and directories. "avoid leaking mounts" is simply what had to be implemented so that we could support arbitrary unlinks. >From an implementation standpoint allowing the rename means that filesystems no longer have to be concerned with a vfs cache that is out of sync with the actual filesystem. I agree that mount points moving is a weird semantic, but it is not a particularly new semantic. We already allow this in 3.16 and before if we rename the parent directory of a mount point. When we discussed what to do with renames of mounts in the review of the patch no examples could be found where we actually cared (it appears no one is silly enough to put a mount point where someone else could rename it), and the alternative of keeping some kind of overlay mount structure so we could keep a mount in the same place even after the underlying mountpoint was renamed was decided to be more trouble than it was worth. All of the mechanisms for avoiding trouble when a mount point is renamed already exist and fusermount already uses them. I truly hope the due dilligence of research and public discussion (i.e. https://lwn.net/Articles/570338/) and retaining the existing semantics in a single mount namespace of what we are doing with renames is sufficient to avoid breaking some weird usespace application. Breaking peoples applications sucks. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html