Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

Vipin Sharma <vipinsh@xxxxxxxxxx> · Tue, 18 Jan 2022 12:25:48 -0800

Hi Michal,

Thanks for looking into this patch. I will be using Sean's suggestion
and use real_parent of the task. This will avoid my ugly code in the
cgroup APIs.

On Wed, Jan 5, 2022 at 10:04 AM Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> Hi Vipin.
>
> On Wed, Dec 22, 2021 at 10:53:50PM +0000, Vipin Sharma <vipinsh@xxxxxxxxxx> wrote:
> > VM worker kthreads can linger in the VM process's cgroup for sometime
> > after KVM terminates the VM process.
>
> Why is it a problem? And how long are we talking about?
>

Automated tools/scripts which delete VM cgroups after the main KVM
process ends were seeing deletion errors because kernel worker threads
were still running inside those cgroups. This is not a very frequent
issue but we noticed it happens every now and then.

> > A VM process can terminate between the time window of exit_mm() to
> > cgroup_exit(), leaving only worker kthreads in the cgroup.
>
> Even kthreads should eventually have PF_EXITING set, they shouldd be
> treated as "user-space" zombies by cgroups, i.e. mostly invisible (e.g.
> it doesn't prevent rmdir'ing the cgroup).
>

Since that eventual time period is not known, we can either pause the
script for sometime before starting the cleanup or add some x number
of retries. Both of which are not preferable due to indeterministic
nature.

> (And after the last task_struct reference is gone, the cgroup structs
> can be released too. Maybe the cause is holding the reference to the KVM
> worker thread somewhere for too long.)
>
> > Moving worker kthreads back to the original cgroup (kthreadd_task's
> > cgroup) makes sure that cgroup is empty as soon as the main VM process
> > is terminated.
>
> BTW this used to be done for "user-space" tasks too (migrate to root
> cgroup) but it was replaced with the less transactional "ignore zombies"
> approach. So this change seems inconsistent.
>
Interesting, however, until PF_EXITING is set those threads will also
exhibit the same behavior if it comes to deletion. Thanks for the
background, good to know.

Thanks