VM worker kthreads can linger in the VM process's cgroup for sometime after KVM terminates the VM process. KVM terminates the worker kthreads by calling kthread_stop() which waits on the 'exited' completion, triggered by exit_mm(), via mm_release(), during kthread's exit. However, these kthreads are removed from the cgroup using cgroup_exit() call which happens after exit_mm(). A VM process can terminate between the time window of exit_mm() to cgroup_exit(), leaving only worker kthreads in the cgroup. Moving worker kthreads back to the original cgroup (kthreadd_task's cgroup) makes sure that cgroup is empty as soon as the main VM process is terminated. kthreadd_task is not an exported symbol which causes build errors if KVM is built as a loadable module. Both users (kvm_main & vhost) of cgroup_attach_task_all(), have the same issue, therefore, using kthreadd_task as a default option is chosen when the API is called with NULL argument. Signed-off-by: Vipin Sharma <vipinsh@xxxxxxxxxx> --- v2: - Use kthreadd_task in the cgroup API to avoid build issue. v1: https://lore.kernel.org/lkml/20211214050708.4040200-1-vipinsh@xxxxxxxxxx/ kernel/cgroup/cgroup-v1.c | 5 +++++ virt/kvm/kvm_main.c | 15 ++++++++++++++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c index 81c9e0685948..81d4b2f2acf0 100644 --- a/kernel/cgroup/cgroup-v1.c +++ b/kernel/cgroup/cgroup-v1.c @@ -51,6 +51,8 @@ bool cgroup1_ssid_disabled(int ssid) * @from: attach to all cgroups of a given task * @tsk: the task to be attached * + * If @from is NULL then use kthreadd_task for finding the destination cgroups. + * * Return: %0 on success or a negative errno code on failure */ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk) @@ -58,6 +60,9 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk) struct cgroup_root *root; int retval = 0; + if (!from) + from = kthreadd_task; + mutex_lock(&cgroup_mutex); percpu_down_write(&cgroup_threadgroup_rwsem); for_each_root(root) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b0f7e6eb00ff..f7504578c374 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5785,7 +5785,7 @@ static int kvm_vm_worker_thread(void *context) init_context = NULL; if (err) - return err; + goto out; /* Wait to be woken up by the spawner before proceeding. */ kthread_parkme(); @@ -5793,6 +5793,19 @@ static int kvm_vm_worker_thread(void *context) if (!kthread_should_stop()) err = thread_fn(kvm, data); +out: + /* + * We need to move the kthread back to its original cgroups, so that it + * doesn't linger in the cgroups of the user process after the user + * process has already terminated. + * + * kthread_stop() waits on 'exited' completion condition which is set + * in exit_mm(), via mm_release(), in do_exit(). However, kthread + * is removed from cgroups in the cgroup_exit() which is called after + * exit_mm(). This causes lingering of kthreads in cgroups after main + * VM process has finished. + */ + WARN_ON(cgroup_attach_task_all(NULL, current)); return err; } base-commit: 5e4e84f1124aa02643833b7ea40abd5a8e964388 -- 2.34.1.307.g9b7440fafd-goog