Re: [PATCH v2] KVM: Don't actually set a request when evicting vCPUs for GFN cache invd

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 25 Feb 2022 16:13:27 +0000

On Fri, Feb 25, 2022, Woodhouse, David wrote:
> On Wed, 2022-02-23 at 16:53 +0000, Sean Christopherson wrote:
> > Don't actually set a request bit in vcpu->requests when making a request
> > purely to force a vCPU to exit the guest.  Logging a request but not
> > actually consuming it would cause the vCPU to get stuck in an infinite
> > loop during KVM_RUN because KVM would see the pending request and bail
> > from VM-Enter to service the request.
> 
> Hm, it might be that we *do* want to do some work.
> 
> I think there's a problem with the existing kvm_host_map that we
> haven't yet resolved with the new gfn_to_pfn_cache.
> 
> Look for the calls to 'kvm_vcpu_unmap(…, true)' in e.g. vmx/nested.c
> 
> Now, what if a vCPU is in guest mode, doesn't vmexit back to the L1,
> its userspace thread takes a signal and returns to userspace.
> 
> The pages referenced by those maps may have been written, but because
> the cache is still valid, they haven't been marked as dirty in the KVM
> dirty logs yet.
> 
> So, a traditional live migration workflow once it reaches convergence
> would pause the vCPUs, copy the final batch of dirty pages to the
> destination, then destroy the VM on the source.
> 
> And AFAICT those mapped pages don't actually get marked dirty until
> nested_vmx_free_cpu() calls vmx_leave_nested(). Which will probably
> trigger the dirty log WARN now, since there's no active vCPU context
> for logging, right?
> 
> And the latest copy of those pages never does get copied to the
> destination.
> 
> Since I didn't spot that problem until today, the pfn_to_gfn_cache
> design inherited it too. The 'dirty' flag remains set in the GPC until
> a subsequent revalidate or explicit unmap.
> 
> Since we need an active vCPU context to do dirty logging (thanks, dirty
> ring)... and since any time vcpu_run exits to userspace for any reason
> might be the last time we ever get an active vCPU context... I think
> that kind of fundamentally means that we must flush dirty state to the
> log on *every* return to userspace, doesn't it?

I would rather add a variant of mark_page_dirty_in_slot() that takes a vCPU, which
we whould have in all cases.  I see no reason to require use of kvm_get_running_vcpu().