On Fri, Jan 25, 2019 at 08:47:57PM +0100, Arkadiusz Miśkiewicz wrote: > On 25/01/2019 17:37, Tejun Heo wrote: > > On Fri, Jan 25, 2019 at 08:52:11AM +0100, Arkadiusz Miśkiewicz wrote: > >> On 24/01/2019 12:21, Arkadiusz Miśkiewicz wrote: > >>> On 17/01/2019 14:17, Arkadiusz Miśkiewicz wrote: > >>>> On 17/01/2019 13:25, Aleksa Sarai wrote: > >>>>> On 2019-01-17, Arkadiusz Miśkiewicz <a.miskiewicz@xxxxxxxxx> wrote: > >>>>>> Using kernel 4.19.13. > >>>>>> > >>>>>> For one cgroup I noticed weird behaviour: > >>>>>> > >>>>>> # cat pids.current > >>>>>> 60 > >>>>>> # cat cgroup.procs > >>>>>> # > >>>>> > >>>>> Are there any zombies in the cgroup? pids.current is linked up directly > >>>>> to __put_task_struct (so exit(2) won't decrease it, only the task_struct > >>>>> actually being freed will decrease it). > >>>>> > >>>> > >>>> There are no zombie processes. > >>>> > >>>> In mean time the problem shows on multiple servers and so far saw it > >>>> only in cgroups that were OOMed. > >>>> > >>>> What has changed on these servers (yesterday) is turning on > >>>> memory.oom.group=1 for all cgroups and changing memory.high from 1G to > >>>> "max" (leaving memory.max=2G limit only). > >>>> > >>>> Previously there was no such problem. > >>>> > >>> > >>> I'm attaching reproducer. This time tried on different distribution > >>> kernel (arch linux). > >>> > >>> After 60s pids.current still shows 37 processes even if there are no > >>> processes running (according to ps aux). > >> > >> > >> The same test on 5.0.0-rc3-00104-gc04e2a780caf and it's easy to > >> reproduce bug. No processes in cgroup but pids.current reports 91. > > > > Can you please see whether the problem can be reproduced on the > > current linux-next? > > > > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > > I can reproduce on next (5.0.0-rc3-next-20190125), too: How reliably you can reproduce it? I've tried to run your reproducer several times with different parameters, but wasn't lucky so far. What's yours cpu number and total ram size? Can you, please, provide the corresponding dmesg output? I've checked the code again, and my wild guess is that these missing tasks are waiting (maybe hopelessly) for the OOM reaper. Dmesg output might be very useful here. Thanks!