Re: Bug? Incompatible APF for 4.14 guest on 5.10 and later host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Mancini, Riccardo" <mancio@xxxxxxxxxx> writes:

> Hi,
>
> when a 4.14 guest runs on a 5.10 host (and later), it cannot use APF (despite
> CPUID advertising KVM_FEATURE_ASYNC_PF) due to the new interrupt-based
> mechanism 2635b5c4a0 (KVM: x86: interrupt based APF 'page ready' event delivery).
> Kernels after 5.9 won't satisfy the guest request to enable APF through
> KVM_ASYNC_PF_ENABLED, requiring also KVM_ASYNC_PF_DELIVERY_AS_INT to be set.
> Furthermore, the patch set seems to be dropping parts of the legacy #PF handling
> as well.
> I consider this as a bug as it breaks APF compatibility for older guests running
> on newer kernels, by breaking the underlying ABI.
> What do you think? Was this a deliberate decision?

It was. #PF based "page ready" injection was found to be fragile as in
some cases it can collide with an actual #PF and nothing good is
expected if this ever happens. I don't think we've actually broken the
ABI as "asynchronous page fault" was always a "best effort" service: the
guest indicates its readiness to process 'page missing' events but the
host is under no obligation to actually send such notifications.

> Was this already reported in the past (I couldn't find anything in the mailing list
> but I might have missed it!)?

I think it was Andy Lutomirski who started the discussion, see
e.g. https://lore.kernel.org/lkml/ed71d0967113a35f670a9625a058b8e6e0b2f104.1583547991.git.luto@xxxxxxxxxx/

the patch is about KVM_ASYNC_PF_SEND_ALWAYS but if you go down the
discussion you'll find more concerns expressed.

> Would it be much effort to support the legacy #PF based mechanism for older
> guests that choose to only set KVM_ASYNC_PF_ENABLED?

Personally, I wouldn't go down this road: #PF injection at random time
(for page-ready events) is still considered being fragile.

>
> The reason this is an issue for us now is that not having APF for older guests
> introduces a significant performance regression on 4.14 guests when paired to
> uffd handling of "remote" page-faults (similar to a live migration scenario)
> when we update from a 4.14 host kernel to a 5.10 host kernel.

What about backporting interrupt-based APF mechanism to older guests?

-- 
Vitaly




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux