Ram Pai <linuxram@xxxxxxxxxx> writes: > On Tue, May 30, 2017 at 10:07:49AM -0500, Eric W. Biederman wrote: >> Ram Pai <linuxram@xxxxxxxxxx> writes: >> >> > On Wed, May 17, 2017 at 12:54:34AM -0500, Eric W. Biederman wrote: >> >> >> >> While investigating some poor umount performance I realized that in >> >> the case of overlapping mount trees where some of the mounts are locked >> >> the code has been failing to unmount all of the mounts it should >> >> have been unmounting. >> >> >> >> This failure to unmount all of the necessary >> >> mounts can be reproduced with: >> >> >> >> $ cat locked_mounts_test.sh >> >> >> >> mount -t tmpfs test-base /mnt >> >> mount --make-shared /mnt >> >> mkdir -p /mnt/b >> >> >> >> mount -t tmpfs test1 /mnt/b >> >> mount --make-shared /mnt/b >> >> mkdir -p /mnt/b/10 >> >> >> >> mount -t tmpfs test2 /mnt/b/10 >> >> mount --make-shared /mnt/b/10 >> >> mkdir -p /mnt/b/10/20 >> >> >> >> mount --rbind /mnt/b /mnt/b/10/20 >> >> >> >> unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi' >> >> sleep 1 >> >> umount -l /mnt/b >> >> wait %% >> >> >> >> $ unshare -Urm ./locked_mounts_test.sh >> >> >> >> This failure is corrected by removing the prepass that marks mounts >> >> that may be umounted. >> >> >> >> A first pass is added that umounts mounts if possible and if not sets >> >> mount mark if they could be unmounted if they weren't locked and adds >> >> them to a list to umount possibilities. This first pass reconsiders >> >> the mounts parent if it is on the list of umount possibilities, ensuring >> >> that information of umoutability will pass from child to mount parent. >> >> >> >> A second pass then walks through all mounts that are umounted and processes >> >> their children unmounting them or marking them for reparenting. >> >> >> >> A last pass cleans up the state on the mounts that could not be umounted >> >> and if applicable reparents them to their first parent that remained >> >> mounted. >> >> >> >> While a bit longer than the old code this code is much more robust >> >> as it allows information to flow up from the leaves and down >> >> from the trunk making the order in which mounts are encountered >> >> in the umount propgation tree irrelevant. >> > >> > Eric, >> > >> > I tried multiple time to understand the algorithm, but failed >> > to understand the reasoning behind each of the steps. Hence >> > I can't tell if the algorithm is correct or wrong. >> > >> > I know you are trying to optimize the current algorithm, >> > but what is the key insight that you are trying to leverage >> > to optimize it? That probably might help me analyze the >> > algorithm. >> > >> > >> > You walk the propogation tree, and for each element in the >> > propagation-tree you try to unmount its entire mount-tree. >> > (not sure if this operation is correct, since I know, I had >> > given an example in the past where this can go wrong). >> >> I think you are refering to when I tried to propgate the entire tree >> mount and was not following the individual propagation trees for >> each mount. Which left some mounts mounted that if we had >> followed the individual propgation trees would have been umounted. >> >> This code does not do anything like that it simply follows the >> individual umount propgation trees. >> >> > And later if you find that the unmount is successful, you try >> > to walk up and see if the parent can also be unmounted(dont know >> > why this is needed). >> >> > Sorry, but if you can help with some key insights, it will help. >> >> The first insight is that the parent child relationship that happens >> between mounts in the set of mounts removed by MNT_DETACH is not >> necessarily the same parent child relationship between the mounts >> the are propagated to. >> >> Which leads to the second insight that we can not guarantee that during >> umount when the mount propgation tree is being walked we can not >> gaurantee that the leaves are being walked first. >> > > If you agree that " (A) tucked mounts do/should NOT receive propagation > events on its root dentry ", than i strongly think; short of asserting, > that the propagation tree will always reach the child first > and not the parent. I will further verify my thoughts before > elevating my strong-thinking to an assertion. What led to the implementation of these mounts as tucked mounts rather than side/shadow mounts is Al Viro's assertion that these are normal mounts. As normal mounts I tend to think they should be treated normally. My experience to date suggests that except for artifically created pathlogical cases the possible ways we can treat tucked mounts does not appear to matter. So I do not yet agree to (A). I don't think we have ever actually implemented (A). We have implemented something very similar but the act of untucking a mount resulted in what was at the root of a dentry receving events during the same umount MNT_DETACH instance. All that is important for the optimizations in patch 2/2 is that resolving the underying mount is a separate and final pass from dealing with mount propgation. So your (A) could be compatible with the code in patch 1/2. > But if you do not agree with (A) than yes we could reach the > parent or the child first and we will badly need your algorithm. Please note my example does not involve any tucked mounts. I see this issue with a very ordinary set of mounts and mount propagation. So I think we have rough agreement that my algorithm in patch 1/2 is needed. > BTW: This is the same disagreement that we had earlier regarding > the semantics of tucked mounts. > > Assuming that you may not agree with (A), I looked through your > explaination and the code/algorithm and the steps make sense. > The one thing that was not clear to me was -- is it ok to umount > child whose ancestor is MNT_LOCKED? Your algorithm seems > to allow that. If you have a tree of mounts: With the root of the tree not locked onto it's mountpoint and the rest of the mounts in the tree locked onto the mountpoint. Yes it is ok to unmount the tree. (Just not the individual pieces). While user space has access to the mounts it is necessary to keep the pieces locked together. MNT_LOCKED guards against seeing the mountpint or any of the files or directories in the mountpoint. MNT_LOCKED is a mechanism for dealing with privilege differences between mount namespaces. >> Without tucked mounts, without locked mounts that information is enough >> to say the original mount propgation code for umount was incorrect as it >> assumed that the leaves would be walked first. >> >> I believe the test case I have given above is an example of that. It >> has locked mounts in it as well which made everything doubly ugly. >> >> To handle the case that the code may visit the parent before the child >> if something is mounted on top of a mount we wish to unmount it is >> added to the to_restore list instead of the to_umount list. >> >> If a mount does not have children and it is unmounted __propgate_umount >> returns true. The code then looks at the parent mount and it is on >> the to_restore list it tries to unmount the parent again. >> >> That is the leaf to root propagation. >> >> >> >> There is also root to leaf propgation in umount_list to handle the case of >> locked mounts. Locked mounts come about because a more privileged user >> propagated them into your mount namespace as a set, and the unprivileged >> user is not allowed to break up the set lest they see something under a >> mount they should not see. >> >> When a mount that could be unmounted if it was not locked to it's parent >> it is marked and placed on the to_restore list. >> >> >> When examining children umount_one removes mounts from their parents >> mnt_mounts list. >> >> Children that fully cover a mount (toppers) are ignored. >> Children that if not locked would be unmounted are ignored >> as those children become unmountable if their parent >> is unmountable. > > I find the children can get unmounted even when one its ancestor is > locked. An example case is -- if the child is a leaf and one of its > ancestor is locked, but is not part of any related propagation tree. I am not quite certain what you are asserting. Locking is an operation to hide the contents of a mountpoint. So a leaf that is not locked to it's parent is perfectly fine to unmount. >> >> This allows a tree of mounts that is locked together to >> be unmounted if the root is unmountable. >> >> >> Does that help? > > Yes your explaination helped, > > Sorry it took some time to get to this, Thank you for making the time to get to this and have the conversation. These are tough issues to dig through, and we aren't having the easiest of conversations. Eric