Hello, On Fri, Jun 15, 2018 at 05:26:04PM +0300, Ivan Zahariev wrote: > The standard RLIMIT_NPROC does not suffer from such accounting > discrepancies at any time. RLIMIT_NPROC uses a dedicated atomic counter which is updated when the process is getting reaped; however, that doesn't actually coincide with the pid being freed. The base pid ref is put then but there can be other refs and even after that it has to go through RCU grace period to be actually freed. They seem equivalent but serve a bit different purposes. RLIMIT_NPROC is primarily about limiting what the user can do and doesn't guarantee that that actually matches resource (pid here) consumption. pid controller's primary role is limiting pid consumption - ie. no matter what happens the cgroup must not be able to take away more than the specified number from the available pool, which has to account for the lazy release and draining refs and stuff. > The "memory" cgroups controller also does > not suffer from any discrepancies -- it accounts memory usage in > real time without any lag on process start or exit. The "tasks" file > list is also always up-to-date. The memory controller does the same thing, actually way more extensively. It's just less noticeable because people generally don't try to control at individual page level. > Is it really technically not possible to make "pids.current" do > accounting properly like RLIMIT_NPROC does? We were hoping to > replace RLIMIT_NPROC with the "pids" controller. It is of course possible but at a cost. The cost (getting rid of lazy release optimizations) is just not justifiable for most cases. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html