Paolo, ping? On Tue, 25 Aug 2020, David Rientjes wrote: > There may be many encrypted regions that need to be unregistered when a > SEV VM is destroyed. This can lead to soft lockups. For example, on a > host running 4.15: > > watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348] > CPU: 206 PID: 194348 Comm: t_virtual_machi > RIP: 0010:free_unref_page_list+0x105/0x170 > ... > Call Trace: > [<0>] release_pages+0x159/0x3d0 > [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd] > [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd] > [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd] > [<0>] kvm_arch_destroy_vm+0x47/0x200 > [<0>] kvm_put_kvm+0x1a8/0x2f0 > [<0>] kvm_vm_release+0x25/0x30 > [<0>] do_exit+0x335/0xc10 > [<0>] do_group_exit+0x3f/0xa0 > [<0>] get_signal+0x1bc/0x670 > [<0>] do_signal+0x31/0x130 > > Although the CLFLUSH is no longer issued on every encrypted region to be > unregistered, there are no other changes that can prevent soft lockups for > very large SEV VMs in the latest kernel. > > Periodically schedule if necessary. This still holds kvm->lock across the > resched, but since this only happens when the VM is destroyed this is > assumed to be acceptable. > > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx> > --- > arch/x86/kvm/svm/sev.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -1106,6 +1106,7 @@ void sev_vm_destroy(struct kvm *kvm) > list_for_each_safe(pos, q, head) { > __unregister_enc_region_locked(kvm, > list_entry(pos, struct enc_region, list)); > + cond_resched(); > } > } > >