On Thu, Oct 13, 2016 at 2:46 PM, Andrei Vagin <avagin@xxxxxxxxxxxxx> wrote: > On Thu, Oct 13, 2016 at 02:53:46PM -0500, Eric W. Biederman wrote: >> >> Adrei Vagin pointed out that time to executue propagate_umount can go >> non-linear (and take a ludicrious amount of time) when the mount >> propogation trees of the mounts to be unmunted by a lazy unmount >> overlap. >> >> Solve this in the most straight forward way possible, by adding a new >> mount flag to mark parts of the mount propagation tree that have been >> visited, and use that mark to skip parts of the mount propagation tree >> that have already been visited during an unmount. This guarantees >> that each mountpoint in the possibly overlapping mount propagation >> trees will be visited exactly once. >> >> Add the functions propagation_visit_next and propagation_revisit_next >> to coordinate setting and clearling the visited mount mark. >> >> Here is a script to generate such mount tree: >> $ cat run.sh >> mount -t tmpfs test-mount /mnt >> mount --make-shared /mnt >> for i in `seq $1`; do >> mkdir /mnt/test.$i >> mount --bind /mnt /mnt/test.$i >> done >> cat /proc/mounts | grep test-mount | wc -l >> time umount -l /mnt >> $ for i in `seq 10 16`; do echo $i; unshare -Urm bash ./run.sh $i; done >> >> Here are the performance numbers with and without the patch: >> >> mounts | before | after (real sec) >> ----------------------------- >> 1024 | 0.071 | 0.024 >> 2048 | 0.184 | 0.030 >> 4096 | 0.604 | 0.040 >> 8912 | 4.471 | 0.043 >> 16384 | 34.826 | 0.082 >> 32768 | | 0.151 >> 65536 | | 0.289 >> 131072 | | 0.659 >> >> Andrei Vagin fixing this performance problem is part of the >> work to fix CVE-2016-6213. >> >> Cc: stable@xxxxxxxxxxxxxxx >> Reported-by: Andrei Vagin <avagin@xxxxxxxxxx> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >> --- >> >> Andrei can you take a look at this patch and see if you can see any >> problems. My limited testing suggests this approach does a much better >> job of solving the problem you were seeing. With the time looking >> almost linear in the number of mounts now. > > I read this patch and I like the idea. > > Then I run my tests and one of them doesn't work with this patch. > I haven't found a reason yet. >> + for (m = propagation_visit_next(parent, parent); m; >> + m = propagation_visit_next(m, parent)) { >> struct mount *child = __lookup_mnt_last(&m->mnt, >> mnt->mnt_mountpoint); The reason is that this loop is called for different "mnt", but it is executed only once with this optimization. So I think the idea to mark parent will not work, because one parent can have a few children which have to be umounted. > > Here is the test: > > [root@fc24 mounts]# cat run.sh > set -e > mount -t tmpfs zdtm /mnt > mkdir -p /mnt/1 /mnt/2 > mount -t tmpfs zdtm /mnt/1 > mount --make-shared /mnt/1 > for i in `seq $1`; do > mount --bind /mnt/1 `mktemp -d /mnt/1/test.XXXXXX` > done > mount --rbind /mnt/1 /mnt/2 > cat /proc/self/mountinfo | grep zdtm | wc -l > time umount -l /mnt/1 > cat /proc/self/mountinfo | grep zdtm | wc -l > umount /mnt/2 > > > [root@fc24 mounts]# unshare -Urm ./run.sh 5 > 65 > > real 0m0.014s > user 0m0.000s > sys 0m0.004s > 33 > umount: /mnt/2: target is busy > (In some cases useful info about processes that > use the device is found by lsof(8) or fuser(1).) > >> Thanks, Andrei -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html