bugzilla-daemon@xxxxxxxxxx writes: > https://bugzilla.kernel.org/show_bug.cgi?id=215596 > > Bug ID: 215596 > Summary: Commit 59ec715 breaks systemd LimitNPROC with > PrivateUsers > Product: Other > Version: 2.5 > Hardware: All > OS: Linux > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: other_other@xxxxxxxxxxxxxxxxxxxx > Reporter: etienne@xxxxxxxxxxxx > CC: ebiederm@xxxxxxxxxxxx, mkoutny@xxxxxxxx, > solar@xxxxxxxxxxxx > Regression: Yes > > Commit 59ec715 "ucounts: Fix rlimit max values check", first included in Linux > 5.15.12, breaks systemd "LimitNPROC" (RLIMIT_NPROC) when combined with > "PrivateUsers" (user namespacing). > > This can be reproduced with a trivial systemd service file: > > [Service] > User=nobody > PrivateUsers=yes > LimitNPROC=4 > Type=oneshot > ExecStart=/bin/true > > Which, on 59ec715, fails with: > > Failed to execute /bin/true: Resource temporarily unavailable > Failed at step EXEC spawning /bin/true: Resource temporarily unavailable > Main process exited, code=exited, status=203/EXEC > > (Even though user `nobody` has no running processes besides this one) > > A strace on PID 1 reveals the following sequence of calls (excerpt): > > clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, > child_tidptr=0x40e60150) = 129 > [pid 129] prlimit64(0, RLIMIT_NPROC, {rlim_cur=4, rlim_max=4}, NULL) = 0 > [pid 129] unshare(CLONE_NEWUSER) = 0 > [pid 129] setresuid(65534, 65534, 65534) = 0 > [pid 129] execve("/bin/true", ["/bin/true"], 0x552ad950a0 /* 7 vars */) = -1 > EAGAIN (Resource temporarily unavailable) Do you happen to know which user the code was running as when prlimit64 was called? Really it only matters before the unshare(CLONE_NEWUSER). > On the parent commit of 59ec715 the service starts successfully. > > This is still reproducible on current master (83e3966). > > Relevant patch discussion: > https://lore.kernel.org/lkml/87lf0g9xq7.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#m0a39edf27bc5aabca58b2c2a3d81704818d2c6fe > > This more recent thread also seems highly relevant: > https://lore.kernel.org/lkml/20220207121800.5079-1-mkoutny@xxxxxxxx/ What this looks like is the user that called unshare had more that 4 processes running. If that user is root is root there is an easy argument for fixing this. Looking at the behavior from your trace and reading the code I don't think the code was running as user root. If the user that called unshare was not root, the question becomes what are you trying to achieve. You say it breaks LimitNPROC with PrivateUsers but I don't see how this could have worked reliably in the past even without the change. What limit were you expecting to be enforced? Right now this looks like: Set RLIMIT_NPROC to 4. Have more than 4 processes. The kernel enforces the limit. There is a lot of weird and goofy history with RLIMIT_NPROC so I am open to learning something that would let this be a sensible case. Right now I unfortunately am not seeing it. Eric