Andrey Wagin <avagin@xxxxxxxxx> writes: > On Tue, Jun 18, 2013 at 02:56:51AM +0400, Andrey Wagin wrote: >> 2013/6/17 Eric W. Biederman <ebiederm@xxxxxxxxxxxx>: >> > So for anyone seriously worried about this kind of thing in general we >> > already have the memory control group, which is quite capable of >> > limiting this kind of thing, >> >> > and it limits all memory allocations not just mount. >> >> And that is problem, we can't to limit a particular slab. Let's >> imagine a real container with 4Gb of RAM. What is a kernel memory >> limit resonable for it? I setup 64 Mb (it may be not enough for real >> CT, but it's enough to make host inaccessible for some minutes). >> >> $ mkdir /sys/fs/cgroup/memory/test >> $ echo $((64 << 20)) > /sys/fs/cgroup/memory/test/memory.kmem.limit_in_bytes >> $ unshare -m >> $ echo $$ > /sys/fs/cgroup/memory/test/tasks >> $ mount --make-rprivate / >> $ mount -t tmpfs xxx /mnt >> $ mount --make-shared /mnt >> $ time bash -c 'set -m; for i in `seq 30`; do mount --bind /mnt >> `mktemp -d /mnt/test.XXXXXX` & done; for i in `seq 30`; do wait; >> done' >> real 0m23.141s >> user 0m0.016s >> sys 0m22.881s >> >> While the last script is working, nobody can't to read /proc/mounts or >> mount something. I don't think that users from other containers will >> be glad. This problem is not so significant in compared with umounting >> of this tree. >> >> $ strace -T umount -l /mnt >> umount("/mnt", MNT_DETACH) = 0 <548.898244> >> The host is inaccessible, it writes messages about soft lockup in >> kernel log and eats 100% cpu. > > Eric, do you agree that > * It is a problem > * Currently we don't have a mechanism to prevent this problem > * We need to find a way to prevent this problem Ugh. I knew mount propagation was annoying semantically but I had not realized the implementation was quite so bad. This doesn't happen in normal operation to normal folks. So I don't think this is something we need to rush in a fix at the last moment to prevent the entire world from melting down. Even people using mount namespaces in containers. I do think it is worth looking at. Which kernel were you testing?. I haven't gotten as far as looking too closely but I just noticed that Al Viro has been busy rewriting the lock of this. So if you aren't testing at least 2.10-rcX you probably need to retest. My thoughts would be. Improve the locking as much as possible, and if that is not enough keep a measure of how many mounts will be affected at least for the umount. Possibly for the umount -l case. Then just don't allow the complexity to exceed some limit so we know things will happen in a timely manner. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html