Hi Michal, Thanks for looking into this patch. I will be using Sean's suggestion and use real_parent of the task. This will avoid my ugly code in the cgroup APIs. On Wed, Jan 5, 2022 at 10:04 AM Michal Koutný <mkoutny@xxxxxxxx> wrote: > > Hi Vipin. > > On Wed, Dec 22, 2021 at 10:53:50PM +0000, Vipin Sharma <vipinsh@xxxxxxxxxx> wrote: > > VM worker kthreads can linger in the VM process's cgroup for sometime > > after KVM terminates the VM process. > > Why is it a problem? And how long are we talking about? > Automated tools/scripts which delete VM cgroups after the main KVM process ends were seeing deletion errors because kernel worker threads were still running inside those cgroups. This is not a very frequent issue but we noticed it happens every now and then. > > A VM process can terminate between the time window of exit_mm() to > > cgroup_exit(), leaving only worker kthreads in the cgroup. > > Even kthreads should eventually have PF_EXITING set, they shouldd be > treated as "user-space" zombies by cgroups, i.e. mostly invisible (e.g. > it doesn't prevent rmdir'ing the cgroup). > Since that eventual time period is not known, we can either pause the script for sometime before starting the cleanup or add some x number of retries. Both of which are not preferable due to indeterministic nature. > (And after the last task_struct reference is gone, the cgroup structs > can be released too. Maybe the cause is holding the reference to the KVM > worker thread somewhere for too long.) > > > Moving worker kthreads back to the original cgroup (kthreadd_task's > > cgroup) makes sure that cgroup is empty as soon as the main VM process > > is terminated. > > BTW this used to be done for "user-space" tasks too (migrate to root > cgroup) but it was replaced with the less transactional "ignore zombies" > approach. So this change seems inconsistent. > Interesting, however, until PF_EXITING is set those threads will also exhibit the same behavior if it comes to deletion. Thanks for the background, good to know. Thanks