Re: [PATCH v2 1/5] kvm/x86: skip async_pf when in guest mode

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Mon, 19 Dec 2016 10:53:48 +0100

On 19/12/2016 08:18, Roman Kagan wrote:
> On Thu, Dec 15, 2016 at 04:09:39PM +0100, Radim Krčmář wrote:
>> 2016-12-15 09:55+0300, Roman Kagan:
>>> On Wed, Dec 14, 2016 at 10:21:11PM +0100, Radim Krčmář wrote:
>>>> async_pf is an optional paravirtual device.  It is L1's fault if it
>>>> enabled something that it doesn't support ...
>>>
>>> async_pf in L1 is enabled by the core Linux; the hypervisor may be
>>> third-party and have no control over it.
>>
>> Admin can pass no-kvmapf to Linux when planning to use a hypervisor that
>> doesn't support paravirtualized async_pf.  Linux allows only in-kernel
>> hypervisors that do have full control over it.
> 
> Imagine you are a hoster providing VPSes to your customers.  You have
> basically no control over what they run there.  Now if you are brave
> enough to enable nested, you most certainly won't want async_pf to
> create problems for your customers only because they have a kernel with
> async_pf support and a hypervisor without (which at the moment means a
> significant fraction of VPS owners).

If you enable nested you should have a (small) list of hypervisors that
you tested.  KVM and Xen should more or less work, ESX should (but I
never tested it), Hyper-V is new in 4.10.  Anything else should be
tested before by the hosting provider.

If virtualbox requires no-kvmapf, you need to document that.  VirtualBox
for Linux is an out-of-tree module, I think it would be pretty crazy to
support it on nested virt, more than a proprietary hypervisor.

Paolo

>>>> AMD's behavior makes sense and already works, therefore I'd like to see
>>>> the same on Intel as well.  (I thought that SVM was broken as well,
>>>> sorry for my misleading first review.)
>>>>
>>>>> To avoid that, only do async_pf stuff when executing L1 guest.
>>>>
>>>> The good thing is that we are already killing VMX L1 with async_pf, so
>>>> regressions don't prevent us from making Intel KVM do the same as AMD:
>>>> force a nested VM exit from nested_vmx_check_exception() if the injected
>>>> #PF is async_pf and handle the #PF VM exit in L1.
>>>
>>> I'm not getting your point: the wealth of existing hypervisors running
>>> in L1 which don't take #PF vmexits can be made not to hang or crash
>>> their guests with a not so complex fix in L0 hypervisor.  Why do the
>>> users need to update *both* their L0 and L1 hypervisors instead?
>>
>> L1 enables paravirtual async_pf to get notified about L0 page faults,
>> which would allow L1 to reschedule the blocked process and get better
>> performance.  Running a guest is just another process in L1, hence we
>> can assume that L1 is interested in being notified.
> 
> That's a nice theory but in practice there is a fair amount of installed
> VMs with a kernel that requests async_pf and a hypervisor that can't
> live with it.
> 
>> If you want a fix without changing L1 hypervisors, then you need to
>> regress KVM on SVM.
> 
> I don't buy this argument.  I don't see any significant difference from
> L0's viewpoint between emulating a #PF vmexit and emulating an external
> interrupt vmexit combined with #PF injection into L1.  The latter,
> however, will keep L1 getting along just fine with the existing kernels
> and hypervisors.
> 
>> This series regresses needlessly, though -- it forces L1 to wait in L2
>> until the page for L2 is fetched by L0.
> 
> Indeed, it's half-baked.  I also just realized that it incorrectly does
> nested vmexit before L1 vmentry but #PF injection is attempted on the
> next round which defeats the whole purpose.  I'll rework the series once
> I have the time (hopefully before x-mas).
> 
> Thanks,
> Roman.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html