Quoting Nikolay Borisov (kernel@xxxxxxxx): > > > On 10/27/2016 05:37 PM, Serge E. Hallyn wrote: > > Quoting Nikolay Borisov (kernel@xxxxxxxx): > >> > >> > >> On 10/26/2016 08:25 PM, Serge E. Hallyn wrote: > >>> On Wed, Oct 26, 2016 at 03:40:27PM +0300, Nikolay Borisov wrote: > >>>> There are container setups which map the same kuids to > >>>> different containers. In such situation what will happen is > >>>> that same uid's in different containers will map to the same > >>>> underlying user on the matchine (e.g. same struct user). One > >>>> implication of this is that the number of processes for that > >>>> particular user are going to be shared among all the same uids > >>>> in the container. This is problematic, as it means a user in > >>>> containerA can potentially exhaust the process limit such that > >>>> a user in containerB cannot spawn any processes. > >>> > >>> Hi - thanks for the description. Based on that, though, I worry > >>> that it is a feature we do not want. Nothing explicitly prohibits > >>> sharing kuids in different containers, but it is is sharing. If > >>> you want greater isolation between two containers, you must not share > >>> any kuids. > >>> > >>> I'm not saying nack, but i am saying it seems a misguided feature > >>> which could lead people to think sharing uids is safer than it is. > >> > >> I agree that in order for this to be considered "secure" it relies on > >> the assumption that there is no leakage between containers. However, > >> there are currently setups which rely on this behavior for whatever > >> (mis)guided reasons. Furthermore the current design of namespaces > >> doesn't do anything to prevent such uses. Given this I don't think it be > >> fair to completely disregard them, hence the patch. > > > > I somehow had missed the fact that (if I read below correctly) you > > are actually solving the problem for RLIMIT_NPROC? That's worthwhile > > then. I thought the ucounts checks were independent and RLIMIT_NPROC > > failures were still going to mysteriously plague sibling containers. > > > > I do still worry about the performance impact of adding the get_ucounts() > > in those hot paths below. Have you done any perf measurements? > > > > Finally managed to get around doing some benchmarking. I performed tests on > 4.9-rc1 with and without my patch. On every kernel I performed 3 tests: > - 5 stress-ng instances, doing 250k forks in the init_user_ns > - 5 stress-ng instances, doing 250k forks in child of init_user_ns (nest factor of 1) > - 5 stress-ng instances, doing 250k forks in child of init_user_ns (nest factor of 5). > > I ran every experiment 5 times and got the stdev and the average values. Here is how I invoked stress-ng : > > for i in {1..5}; do time /home/projects/kernel-testing/stress-ng/stress-ng -f 5 --fork-ops 250000 ; done > > And every unsharing of a namespace was performed by means of "unshare -r" > > The results are as follows: > > Stock 4.9-rc1 Patched 4.9-rc1-nproc > Real STDEV Sys STDEV Real STDEV Sys STDEV > init_user_ns 28.872s 0.167 49.714 0.51 28.386 0.168 49.217 0.899 > depth of 1 27.808 0.448 48.869 0.275 28.303 0.225 50.44 0.174 > depth of 5 27.614 0.226 49.336 0.43 28.22 0.737 49.72 0.934 > > The results are almost identical with only 3% diffrence in the sys time of 1 child ns, > and 1% with 5 ns child. I think the results are rather noisy. While the tests were > running I also observed the % for (dec|inc)_ulimit and they were around 0,06%. Very > negligible. I think on the performance side of things this should be good. Awesome, thanks for that. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers