On Fri, Nov 25, 2016 at 10:15:21AM +0300, Roman Kagan wrote: > On Thu, Nov 24, 2016 at 09:49:59PM +0100, Radim Krčmář wrote: > > 2016-11-24 19:30+0300, Roman Kagan: > > > Async pagefault machinery assumes communication with L1 guests only: all > > > the state -- MSRs, apf area addresses, etc, -- are for L1. However, it > > > currently doesn't check if the vCPU is running L1 or L2, and may inject > > > > > > To reproduce the problem, use a host with swap enabled, run a VM on it, > > > run a nested VM on top, and set RSS limit for L1 on the host via > > > /sys/fs/cgroup/memory/machine.slice/machine-*.scope/memory.limit_in_bytes > > > to swap it out (you may need to tighten and release it once or twice, or > > > create some memory load inside L1). Very quickly L2 guest starts > > > receiving pagefaults with bogus %cr2 (apf tokens from the host > > > actually), and L1 guest starts accumulating tasks stuck in D state in > > > kvm_async_pf_task_wait. > > > > > > To avoid that, only do async_pf stuff when executing L1 guest. > > > > > > Note: this patch only fixes x86; other async_pf-capable arches may also > > > need something similar. > > > > > > Signed-off-by: Roman Kagan <rkagan@xxxxxxxxxxxxx> > > > --- > > > > Applied to kvm/queue, thanks. > > > > The VM task in L1 could be scheduled out instead of hogging the VCPU for > > a long time, so L1 might want to handle async_pf, especially if L1 set > > KVM_ASYNC_PF_SEND_ALWAYS. Another case happens if L1 scheduled out a > > high-priority task on async_pf and executed the low-priority VM task in > > spare time, expecting another #PF when the page is ready, which might be > > long before the next nested VM exit. > > > > Have you considered doing a nested VM exit and delivering the async_pf > > to L1 immediately? > > I haven't, but it seems to make sense indeed for "page ready" async_pfs. > > I'll have a look into it. What's the correct way to kick L2 to L1 from the host? I failed to find one from a brief skimming through the code. We need a sensible exit reason delivered to L1 (probably "external interrupt" will do) but I don't see a method to do so without actually injecting an interrupt into L1 which is not unlikely to confuse it. Any suggestion? Thanks, Roman. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html