On 19/12/2016 08:18, Roman Kagan wrote: > On Thu, Dec 15, 2016 at 04:09:39PM +0100, Radim Krčmář wrote: >> 2016-12-15 09:55+0300, Roman Kagan: >>> On Wed, Dec 14, 2016 at 10:21:11PM +0100, Radim Krčmář wrote: >>>> async_pf is an optional paravirtual device. It is L1's fault if it >>>> enabled something that it doesn't support ... >>> >>> async_pf in L1 is enabled by the core Linux; the hypervisor may be >>> third-party and have no control over it. >> >> Admin can pass no-kvmapf to Linux when planning to use a hypervisor that >> doesn't support paravirtualized async_pf. Linux allows only in-kernel >> hypervisors that do have full control over it. > > Imagine you are a hoster providing VPSes to your customers. You have > basically no control over what they run there. Now if you are brave > enough to enable nested, you most certainly won't want async_pf to > create problems for your customers only because they have a kernel with > async_pf support and a hypervisor without (which at the moment means a > significant fraction of VPS owners). If you enable nested you should have a (small) list of hypervisors that you tested. KVM and Xen should more or less work, ESX should (but I never tested it), Hyper-V is new in 4.10. Anything else should be tested before by the hosting provider. If virtualbox requires no-kvmapf, you need to document that. VirtualBox for Linux is an out-of-tree module, I think it would be pretty crazy to support it on nested virt, more than a proprietary hypervisor. Paolo >>>> AMD's behavior makes sense and already works, therefore I'd like to see >>>> the same on Intel as well. (I thought that SVM was broken as well, >>>> sorry for my misleading first review.) >>>> >>>>> To avoid that, only do async_pf stuff when executing L1 guest. >>>> >>>> The good thing is that we are already killing VMX L1 with async_pf, so >>>> regressions don't prevent us from making Intel KVM do the same as AMD: >>>> force a nested VM exit from nested_vmx_check_exception() if the injected >>>> #PF is async_pf and handle the #PF VM exit in L1. >>> >>> I'm not getting your point: the wealth of existing hypervisors running >>> in L1 which don't take #PF vmexits can be made not to hang or crash >>> their guests with a not so complex fix in L0 hypervisor. Why do the >>> users need to update *both* their L0 and L1 hypervisors instead? >> >> L1 enables paravirtual async_pf to get notified about L0 page faults, >> which would allow L1 to reschedule the blocked process and get better >> performance. Running a guest is just another process in L1, hence we >> can assume that L1 is interested in being notified. > > That's a nice theory but in practice there is a fair amount of installed > VMs with a kernel that requests async_pf and a hypervisor that can't > live with it. > >> If you want a fix without changing L1 hypervisors, then you need to >> regress KVM on SVM. > > I don't buy this argument. I don't see any significant difference from > L0's viewpoint between emulating a #PF vmexit and emulating an external > interrupt vmexit combined with #PF injection into L1. The latter, > however, will keep L1 getting along just fine with the existing kernels > and hypervisors. > >> This series regresses needlessly, though -- it forces L1 to wait in L2 >> until the page for L2 is fetched by L0. > > Indeed, it's half-baked. I also just realized that it incorrectly does > nested vmexit before L1 vmentry but #PF injection is attempted on the > next round which defeats the whole purpose. I'll rework the series once > I have the time (hopefully before x-mas). > > Thanks, > Roman. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html