On 2/22/22 06:48, Vipin Sharma wrote:
VM worker kthreads can linger in the VM process's cgroup for sometime
after KVM terminates the VM process.
KVM terminates the worker kthreads by calling kthread_stop() which waits
on the 'exited' completion, triggered by exit_mm(), via mm_release(), in
do_exit() during the kthread's exit. However, these kthreads are
removed from the cgroup using the cgroup_exit() which happens after the
exit_mm(). Therefore, A VM process can terminate in between the
exit_mm() and cgroup_exit() calls, leaving only worker kthreads in the
cgroup.
Moving worker kthreads back to the original cgroup (kthreadd_task's
cgroup) makes sure that the cgroup is empty as soon as the main VM
process is terminated.
Signed-off-by: Vipin Sharma <vipinsh@xxxxxxxxxx>
Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
Queued, thanks.
Paolo
Thanks Sean, for the example on how to use the real_parent outside of the RCU
critical region. I wrote your name in Suggested-by, I hope you are fine with
it and this is the right tag/way to give you the credit.
v4:
- Read task's real_parent in the RCU critical section.
- Don't log error message from the cgroup_attach_task_all() API.
v3: https://lore.kernel.org/lkml/20220217061616.3303271-1-vipinsh@xxxxxxxxxx/
- Use 'current->real_parent' (kthreadd_task) in the
cgroup_attach_task_all() call.
- Revert cgroup APIs changes in v2. Now, patch does not touch cgroup
APIs.
- Update commit and comment message
v2: https://lore.kernel.org/lkml/20211222225350.1912249-1-vipinsh@xxxxxxxxxx/
- Use kthreadd_task in the cgroup API to avoid build issue.
v1: https://lore.kernel.org/lkml/20211214050708.4040200-1-vipinsh@xxxxxxxxxx/
virt/kvm/kvm_main.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 83c57bcc6eb6..cdf1fa3c60ae 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5810,6 +5810,7 @@ static int kvm_vm_worker_thread(void *context)
* we have to locally copy anything that is needed beyond initialization
*/
struct kvm_vm_worker_thread_context *init_context = context;
+ struct task_struct *parent;
struct kvm *kvm = init_context->kvm;
kvm_vm_thread_fn_t thread_fn = init_context->thread_fn;
uintptr_t data = init_context->data;
@@ -5836,7 +5837,7 @@ static int kvm_vm_worker_thread(void *context)
init_context = NULL;
if (err)
- return err;
+ goto out;
/* Wait to be woken up by the spawner before proceeding. */
kthread_parkme();
@@ -5844,6 +5845,25 @@ static int kvm_vm_worker_thread(void *context)
if (!kthread_should_stop())
err = thread_fn(kvm, data);
+out:
+ /*
+ * Move kthread back to its original cgroup to prevent it lingering in
+ * the cgroup of the VM process, after the latter finishes its
+ * execution.
+ *
+ * kthread_stop() waits on the 'exited' completion condition which is
+ * set in exit_mm(), via mm_release(), in do_exit(). However, the
+ * kthread is removed from the cgroup in the cgroup_exit() which is
+ * called after the exit_mm(). This causes the kthread_stop() to return
+ * before the kthread actually quits the cgroup.
+ */
+ rcu_read_lock();
+ parent = rcu_dereference(current->real_parent);
+ get_task_struct(parent);
+ rcu_read_unlock();
+ cgroup_attach_task_all(parent, current);
+ put_task_struct(parent);
+
return err;
}
base-commit: 1bbc60d0c7e5728aced352e528ef936ebe2344c0