Re: [PATCH v2] KVM: Move VM's worker kthreads back to the original cgroups before exiting.

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Wed, 19 Jan 2022 19:02:53 +0100

On 1/18/22 21:39, Tejun Heo wrote:
So, these are normally driven by the !populated events. That's how everyone
else is doing it. If you want to tie the kvm workers lifetimes to kvm
process, wouldn't it be cleaner to do so from kvm side? ie. let kvm process
exit wait for the workers to be cleaned up.

It does.  For example kvm_mmu_post_init_vm's call to
kvm_vm_create_worker_thread is matched with the call to
kthread_stop in kvm_mmu_pre_destroy_vm.

According to Vpin, the problem is that there's a small amount of time
between the return from kthread_stop and the point where the cgroup
can be removed.  My understanding of the race is the following:

user process			kthread			management
------------			-------			----------
							wait4()
exit_task_work()
  ____fput()
    kvm_mmu_pre_destroy_vm()
      kthread_stop();
        wait_for_completion();
				exit_signals()
				  /* set PF_EXITING */
				exit_mm()
				  exit_mm_release()
				    complete_vfork_done()
				      complete();
cgroup_exit()
  cgroup_set_move_task()
    css_set_update_populated()
exit_notify()
  do_notify_parent()
							<wakeup>
							rmdir()
							  cgroup_destroy_locked()
							    cgroup_is_populated()
							    return -EBUSY
				cgroup_exit()
				  cgroup_set_move_task()
				    css_set_update_populated()

I cannot find the code that makes it possible to rmdir a cgroup
if PF_EXITING is set.

Paolo