Re: [PATCH] KVM: x86: async_pf: check earlier if can deliver async pf

Nikita Kalyazin <kalyazin@xxxxxxxxxx> · Mon, 25 Nov 2024 15:50:05 +0000

On 21/11/2024 21:05, Sean Christopherson wrote:
On Thu, Nov 21, 2024, Nikita Kalyazin wrote:
On 19/11/2024 13:24, Sean Christopherson wrote:
None of this justifies breaking host-side, non-paravirt async page faults.  If a
vCPU hits a missing page, KVM can schedule out the vCPU and let something else
run on the pCPU, or enter idle and let the SMT sibling get more cycles, or maybe
even enter a low enough sleep state to let other cores turbo a wee bit.

I have no objection to disabling host async page faults, e.g. it's probably a net
negative for 1:1 vCPU:pCPU pinned setups, but such disabling needs an opt-in from
userspace.

That's a good point, I didn't think about it.  The async work would still
need to execute somewhere in that case (or sleep in GUP until the page is
available).

The "async work" is often an I/O operation, e.g. to pull in the page from disk,
or over the network from the source.  The *CPU* doesn't need to actively do
anything for those operations.  The I/O is initiated, so the CPU can do something
else, or go idle if there's no other work to be done.

If processing the fault synchronously, the vCPU thread can also sleep in the
same way freeing the pCPU for something else,

If and only if the vCPU can handle a PV async #PF.  E.g. if the guest kernel flat
out doesn't support PV async #PF, or the fault happened while the guest was in an
incompatible mode, etc.

If KVM doesn't do async #PFs of any kind, the vCPU will spin on the fault until
the I/O completes and the page is ready.

I ran a little experiment to see that by backing guest memory by a file 
on FUSE and delaying response to one of the read operations to emulate a 
delay in fault processing.

1. Original (the patch isn't applied)

vCPU thread (disk-sleeping):

[<0>] kvm_vcpu_block+0x62/0xe0
[<0>] kvm_arch_vcpu_ioctl_run+0x240/0x1e30
[<0>] kvm_vcpu_ioctl+0x2f1/0x860
[<0>] __x64_sys_ioctl+0x87/0xc0
[<0>] do_syscall_64+0x47/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

Async task (disk-sleeping):

[<0>] folio_wait_bit_common+0x116/0x2e0
[<0>] filemap_fault+0xe5/0xcd0
[<0>] __do_fault+0x30/0xc0
[<0>] do_fault+0x9a/0x580
[<0>] __handle_mm_fault+0x684/0x8a0
[<0>] handle_mm_fault+0xc9/0x220
[<0>] __get_user_pages+0x248/0x12c0
[<0>] get_user_pages_remote+0xef/0x470
[<0>] async_pf_execute+0x99/0x1c0
[<0>] process_one_work+0x145/0x360
[<0>] worker_thread+0x294/0x3b0
[<0>] kthread+0xdb/0x110
[<0>] ret_from_fork+0x2d/0x50
[<0>] ret_from_fork_asm+0x1a/0x30

2. With the patch applied (no async task)

vCPU thread (disk-sleeping):

[<0>] folio_wait_bit_common+0x116/0x2e0
[<0>] filemap_fault+0xe5/0xcd0
[<0>] __do_fault+0x30/0xc0
[<0>] do_fault+0x36f/0x580
[<0>] __handle_mm_fault+0x684/0x8a0
[<0>] handle_mm_fault+0xc9/0x220
[<0>] __get_user_pages+0x248/0x12c0
[<0>] get_user_pages_unlocked+0xf7/0x380
[<0>] hva_to_pfn+0x2a2/0x440
[<0>] __kvm_faultin_pfn+0x5e/0x90
[<0>] kvm_mmu_faultin_pfn+0x1ec/0x690
[<0>] kvm_tdp_page_fault+0xba/0x160
[<0>] kvm_mmu_do_page_fault+0x1cc/0x210
[<0>] kvm_mmu_page_fault+0x8e/0x600
[<0>] vmx_handle_exit+0x14c/0x6c0
[<0>] kvm_arch_vcpu_ioctl_run+0xeb1/0x1e30
[<0>] kvm_vcpu_ioctl+0x2f1/0x860
[<0>] __x64_sys_ioctl+0x87/0xc0
[<0>] do_syscall_64+0x47/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

In both cases the fault handling code is blocked and the pCPU is free 
for other tasks.  I can't see the vCPU spinning on the IO to get 
completed if the async task isn't created.  I tried that with and 
without async PF enabled by the guest (MSR_KVM_ASYNC_PF_EN).

What am I missing?

so the amount of work to be done looks equivalent (please correct me
otherwise).  What's the net gain of moving that to an async work in the host
async fault case? "while allowing interrupt delivery into the guest." -- is
this the main advantage?