Re: [RFC][PATCH] mount: In mark_umount_candidates and __propogate_umount visit each mount once

Andrey Vagin <avagin@xxxxxxxxxx> · Thu, 13 Oct 2016 19:31:22 -0700

On Thu, Oct 13, 2016 at 2:46 PM, Andrei Vagin <avagin@xxxxxxxxxxxxx> wrote:
> On Thu, Oct 13, 2016 at 02:53:46PM -0500, Eric W. Biederman wrote:
>>
>> Adrei Vagin pointed out that time to executue propagate_umount can go
>> non-linear (and take a ludicrious amount of time) when the mount
>> propogation trees of the mounts to be unmunted by a lazy unmount
>> overlap.
>>
>> Solve this in the most straight forward way possible, by adding a new
>> mount flag to mark parts of the mount propagation tree that have been
>> visited, and use that mark to skip parts of the mount propagation tree
>> that have already been visited during an unmount.  This guarantees
>> that each mountpoint in the possibly overlapping mount propagation
>> trees will be visited exactly once.
>>
>> Add the functions propagation_visit_next and propagation_revisit_next
>> to coordinate setting and clearling the visited mount mark.
>>
>> Here is a script to generate such mount tree:
>> $ cat run.sh
>> mount -t tmpfs test-mount /mnt
>> mount --make-shared /mnt
>> for i in `seq $1`; do
>>         mkdir /mnt/test.$i
>>         mount --bind /mnt /mnt/test.$i
>> done
>> cat /proc/mounts | grep test-mount | wc -l
>> time umount -l /mnt
>> $ for i in `seq 10 16`; do echo $i; unshare -Urm bash ./run.sh $i; done
>>
>> Here are the performance numbers with and without the patch:
>>
>> mounts | before | after (real sec)
>> -----------------------------
>>   1024 |  0.071 | 0.024
>>   2048 |  0.184 | 0.030
>>   4096 |  0.604 | 0.040
>>   8912 |  4.471 | 0.043
>>  16384 | 34.826 | 0.082
>>  32768 |        | 0.151
>>  65536 |        | 0.289
>> 131072 |        | 0.659
>>
>> Andrei Vagin fixing this performance problem is part of the
>> work to fix CVE-2016-6213.
>>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Reported-by: Andrei Vagin <avagin@xxxxxxxxxx>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>> ---
>>
>> Andrei can you take a look at this patch and see if you can see any
>> problems.  My limited testing suggests this approach does a much better
>> job of solving the problem you were seeing.  With the time looking
>> almost linear in the number of mounts now.
>
> I read this patch and I like the idea.
>
> Then I run my tests and one of them doesn't work with this patch.
> I haven't found a reason yet.

>> +     for (m = propagation_visit_next(parent, parent); m;
>> +                     m = propagation_visit_next(m, parent)) {
>>               struct mount *child = __lookup_mnt_last(&m->mnt,
>>                                               mnt->mnt_mountpoint);

The reason is that this loop is called for different "mnt", but
it is executed only once with this optimization.

So I think the idea to mark parent will not work, because one parent
can have a few children which have to be umounted.

>
> Here is the test:
>
> [root@fc24 mounts]# cat run.sh
> set -e
> mount -t tmpfs zdtm /mnt
> mkdir -p /mnt/1 /mnt/2
> mount -t tmpfs zdtm /mnt/1
> mount --make-shared /mnt/1
> for i in `seq $1`; do
>         mount --bind /mnt/1 `mktemp -d /mnt/1/test.XXXXXX`
> done
> mount --rbind /mnt/1 /mnt/2
> cat /proc/self/mountinfo | grep zdtm | wc -l
> time umount -l /mnt/1
> cat /proc/self/mountinfo | grep zdtm | wc -l
> umount /mnt/2
>
>
> [root@fc24 mounts]# unshare -Urm ./run.sh  5
> 65
>
> real    0m0.014s
> user    0m0.000s
> sys     0m0.004s
> 33
> umount: /mnt/2: target is busy
>         (In some cases useful info about processes that
>          use the device is found by lsof(8) or fuser(1).)
>
>>

Thanks,
Andrei
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html