On Mon, Nov 02, 2020 at 05:50:29PM +0100, Alexey Gladkov wrote: > Preface > ------- > These patches are for binding the rlimits to a user in the user namespace. > This patch set can be applied on top of: > > git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.8-2-g43e210d68200 > > Problem > ------- > Some rlimits are set per user: RLIMIT_NPROC, RLIMIT_MEMLOCK, RLIMIT_SIGPENDING, > RLIMIT_MSGQUEUE. When several containers are created from one user then > the processes inside the containers influence each other. > > Eric W. Biederman mentioned this issue [1][2][3]. > > Introduced changes > ------------------ > To fix this problem, you can bind the counter of the specified rlimits to the > user within the user namespace. By default, to preserve backward compatibility, > only the initial user namespace is used. This patch adds one more prctl > parameter to change the binding to the user namespace. > > This will not cause the user to take more resources than allowed in the parent > user namespace because it only virtualizes the rlimit counter. Limits in all > parent user namespaces are taken into account. > > For example, this allows us to run multiple containers by the same user and > set the RLIMIT_NPROC to 1 inside. Thanks for picking this up and working on it. This would definitely fix many issues for folks running unprivileged containers using a single id map which is the default behavior for LXC/LXD and so very valuable to us. Christian > > ToDo > ---- > * RLIMIT_MEMLOCK, RLIMIT_SIGPENDING and RLIMIT_MSGQUEUE are not implemented. > * No documentation. > * No tests. > > [1] https://lore.kernel.org/containers/87imd2incs.fsf@xxxxxxxxxxxxxxxxxxxxx/ > [2] https://lists.linuxfoundation.org/pipermail/containers/2020-August/042096.html > [3] https://lists.linuxfoundation.org/pipermail/containers/2020-October/042524.html > > Changelog > --------- > v1: > * After discussion with Eric W. Biederman, I increased the size of ucounts to > atomic_long_t. > * Added ucount_max to avoid the fork bomb. > > -- > > Alexey Gladkov (4): > Increase size of ucounts to atomic_long_t > Move the user's process counter to ucounts > Do not allow fork if RLIMIT_NPROC is exceeded in the user namespace > tree > Allow to change the user namespace in which user rlimits are counted > > fs/exec.c | 13 ++++++--- > fs/io-wq.c | 25 +++++++++++++----- > fs/io-wq.h | 1 + > fs/io_uring.c | 1 + > include/linux/cred.h | 8 ++++++ > include/linux/sched.h | 3 +++ > include/linux/sched/user.h | 1 - > include/linux/user_namespace.h | 12 +++++++-- > include/uapi/linux/prctl.h | 5 ++++ > kernel/cred.c | 44 ++++++++++++++++++++++++------- > kernel/exit.c | 2 +- > kernel/fork.c | 13 ++++++--- > kernel/sys.c | 26 ++++++++++++++++-- > kernel/ucount.c | 48 +++++++++++++++++++++++++++++----- > kernel/user.c | 3 ++- > kernel/user_namespace.c | 3 +++ > 16 files changed, 171 insertions(+), 37 deletions(-) > > -- > 2.25.4 >