Re: [PATCH] kvm/x86: skip async_pf when in guest mode

Roman Kagan <rkagan@xxxxxxxxxxxxx> · Fri, 25 Nov 2016 10:15:22 +0300

On Thu, Nov 24, 2016 at 09:49:59PM +0100, Radim Krčmář wrote:
> 2016-11-24 19:30+0300, Roman Kagan:
> > Async pagefault machinery assumes communication with L1 guests only: all
> > the state -- MSRs, apf area addresses, etc, -- are for L1.  However, it
> > currently doesn't check if the vCPU is running L1 or L2, and may inject
> > 
> > To reproduce the problem, use a host with swap enabled, run a VM on it,
> > run a nested VM on top, and set RSS limit for L1 on the host via
> > /sys/fs/cgroup/memory/machine.slice/machine-*.scope/memory.limit_in_bytes
> > to swap it out (you may need to tighten and release it once or twice, or
> > create some memory load inside L1).  Very quickly L2 guest starts
> > receiving pagefaults with bogus %cr2 (apf tokens from the host
> > actually), and L1 guest starts accumulating tasks stuck in D state in
> > kvm_async_pf_task_wait.
> > 
> > To avoid that, only do async_pf stuff when executing L1 guest.
> > 
> > Note: this patch only fixes x86; other async_pf-capable arches may also
> > need something similar.
> > 
> > Signed-off-by: Roman Kagan <rkagan@xxxxxxxxxxxxx>
> > ---
> 
> Applied to kvm/queue, thanks.
> 
> The VM task in L1 could be scheduled out instead of hogging the VCPU for
> a long time, so L1 might want to handle async_pf, especially if L1 set
> KVM_ASYNC_PF_SEND_ALWAYS.  Another case happens if L1 scheduled out a
> high-priority task on async_pf and executed the low-priority VM task in
> spare time, expecting another #PF when the page is ready, which might be
> long before the next nested VM exit.
> 
> Have you considered doing a nested VM exit and delivering the async_pf
> to L1 immediately?

I haven't, but it seems to make sense indeed for "page ready" async_pfs.  

I'll have a look into it.

Thanks,
Roman.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html