On Wed, Jan 19, 2022 at 10:30 AM Tejun Heo <tj@xxxxxxxxxx> wrote: > > On Wed, Jan 19, 2022 at 07:02:53PM +0100, Paolo Bonzini wrote: > > On 1/18/22 21:39, Tejun Heo wrote: > > > So, these are normally driven by the !populated events. That's how everyone > > > else is doing it. If you want to tie the kvm workers lifetimes to kvm > > > process, wouldn't it be cleaner to do so from kvm side? ie. let kvm process > > > exit wait for the workers to be cleaned up. > > > > It does. For example kvm_mmu_post_init_vm's call to > > kvm_vm_create_worker_thread is matched with the call to > > kthread_stop in kvm_mmu_pre_destroy_vm. > > According to Vpin, the problem is that there's a small amount of time > > between the return from kthread_stop and the point where the cgroup > > can be removed. My understanding of the race is the following: > > Okay, this is because kthread_stop piggy backs on vfork_done to wait for the > task exit intead of the usual exit notification, so it only waits till > exit_mm(), which is uhh... weird. So, migrating is one option, I guess, > albeit a rather ugly one. It'd be nicer if we can make kthread_stop() > waiting more regular but I couldn't find a good existing place and routing > the usual parent signaling might be too complicated. Anyone has better > ideas? > Sean suggested that we can use the real_parent of the kthread task which will always be kthreadd_task, this will also not require any changes in the cgroup API. I like that approach, I will give it a try. This will avoid changes in cgroup APIs completely.