On Mon, May 22, 2017 at 01:33:05PM -0500, Eric W. Biederman wrote: > Ram Pai <linuxram@xxxxxxxxxx> writes: > > > On Mon, May 15, 2017 at 03:10:38PM -0500, Eric W. Biederman wrote: > >> > >> It was observed that in some pathlogical cases that the current code > >> does not unmount everything it should. After investigation it was > >> determined that the issue is that mnt_change_mntpoint can can change > >> which mounts are available to be unmounted during mount propagation > >> which is wrong. > >> > >> The trivial reproducer is: > >> $ cat ./pathological.sh > >> > >> mount -t tmpfs test-base /mnt > >> cd /mnt > >> mkdir 1 2 1/1 > >> mount --bind 1 1 > >> mount --make-shared 1 > >> mount --bind 1 2 > >> mount --bind 1/1 1/1 > >> mount --bind 1/1 1/1 > >> echo > >> grep test-base /proc/self/mountinfo > >> umount 1/1 > >> echo > >> grep test-base /proc/self/mountinfo > >> > >> $ unshare -Urm ./pathological.sh > >> > >> The expected output looks like: > >> 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 > >> 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> > >> 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 > >> 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> > >> The output without the fix looks like: > >> 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 > >> 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> > >> 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000 > >> 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> 52 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000 > >> > >> That last mount in the output was in the propgation tree to be unmounted but > >> was missed because the mnt_change_mountpoint changed it's parent before the walk > >> through the mount propagation tree observed it. > >> > > > > Looks patch correct to me. > > Reviewed-by: Ram Pai <linuxram@xxxxxxxxxx> > > > > BTW: The logic of find_topper() looks not-so-accurate to me. Why dont we > > explicitly flag tucked mounts with MNT_TUCKED, and use that information > > to determine if the child is really a topper? Currently we determine > > the topper if it entirely is covering. How do we diambiguate that from an > > entirely-covering-mount that is explicitly mounted by the administrator? > > A topper situation is applicable only when tucked, right? > > In the current code explictly does not care about the difference. > The code just restricts untucking mounts of any kind to umount > propagation. > > This is where we have previously disagreed. > > A short summary of our previous discussions: > Eric Biederman: find_topper makes tucked mounts ordinary mounts and is simple. > Eric Biederman: I don't see a compelling case for a MNT_TUCKED flag > Eric Biederman: I think the change is a nice behavioral improvement > Ram Pai: a MNT_TUCKED flag would perfectly preserve existing behavior > Ram Pai: find_topper while not perfect is better than the previous > very special case for side/shadow mounts > > With respect to backwards compatibility the set of bugs I am fixing > shows that it is possible to have some very egregious bugs in this > area and in practice no one cares. > > > Without a MNT_TUCKED flag I can readily tell what the following > code should do by simply inspection of the of the mount > propgation information in /proc/self/mountinfo: > Step 1> $ mount -t tmpfs test-base /mnt Step 2> $ cd /mnt Step 3> $ mkdir -p 1 2 1/1 Step 4> $ mount --bind 1 1 Step 5> $ mount --make-shared 1 Step 6> $ mount --bind 1 2 Step 7> $ mount --bind 1/1 1/1 Step 8> $ mount --bind 1/1 1/1 Step 9> $ umount 1/1 > > Before the umount /proc/self/mountinfo shows: > 46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=502,gid=502 > 47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502 > 48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502 > 49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502 > 50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502 > 51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502 > 54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502 > 53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502 > 52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=502,gid=502 > > So it is clear to me that umount /mnt/1/1 should just leave: > /mnt > /mnt/1 > /mnt/2 This is one other place where we disagree.... I expect things to peel back to the state the trees were, when the Step 7 was executed. which is /mnt /mnt/1 /mnt/2 /mnt/1/1 /mnt/2/1 And all tucked mounts disapper. Dont get me wrong. I dont think we will agree because we have different expections. There is no standard on what to expect. Someone authoritative; may be Al Viro, has to define what to expect. > > I would argue that is what the code should always have done. > > I believe the code with a MNT_TUCKED flag would leave: > /mnt > /mnt/1 > /mnt/1/1 > /mnt/2 > But in truth it makes my head hurt to think about it. Yes it is extremely mind-bending; sometimes mind-bleeding. :-( > I don't see that MNT_TUCKED adds anything except aditional code > complexity. > > I don't actually see what the value is in keeping mounts that you can > not use (because they are overmounted) around. I argue that MNT_TUCKED leaves markers that can be used to determine what can be taken out and what needs to be kept. I will stop here and say.. there is value in marking TUCKED mounts. Someone will run into some obscure issue in the future; probably a decade from now, and the same story will repeat. I wish there was a mathematical formula, where you plugin the operation and a state of the trees, and the new state of the mount-trees emerge. For now your patches look good to me. RP > > If the scenarios we were talking about were all limited to perfoming a > mount and then undoing that mount I could almost see some value in a > MNT_TUCKED flag. Given that one of the justications for tucking mounts > in the first place is what happens when you umount something on a slave > mount I really don't like it. As now I get the question what happens > on a slave mount where a mount has been propagated and tucked, and > then the topper is unmounted and a new topper is added. Should unmount > on the parent untuck the propagated mount or leave it there? It was > propagated it was tucked but it wasn't tucked under what is currenty on > top. > > I much prefer the current semantics where we just say mount propagation > can tuck and untuck things, and the history of how the mount tree got > into its current shape is not important. > > Given how difficult it has been to make this code performant and correct > I am not particularly eager to add complexity for unnecessary bug > compatibility. But if it creates a breaking regression for something > (other than a regression test) I am willing to add MNT_TUCKED. > > Eric -- Ram Pai