Re: [PATCH RFCv1 0/7] Support Async Page Fault

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marc,

On 4/10/20 10:52 PM, Marc Zyngier wrote:
Hi Gavin,

On 2020-04-10 09:58, Gavin Shan wrote:
There are two stages of page faults and the stage one page fault is
handled by guest itself. The guest is trapped to host when the page
fault is caused by stage 2 page table, for example missing. The guest
is suspended until the requested page is populated. To populate the
requested page can be costly and might be related to IO activities
if the page was swapped out previously. In this case, the guest has
to suspend for a few of milliseconds at least, regardless of the
overall system load.

The series adds support to asychornous page fault to improve above
situation. If it's costly to populate the requested page, a signal
(PAGE_NOT_PRESENT) is sent to guest so that the faulting process can
be rescheduled if it can be. Otherwise, it is put into power-saving
mode. Another signal (PAGE_READY) is sent to guest once the requested
page is populated so that the faulting process can be waken up either
from either waiting state or power-saving state.

In order to fulfil the control flow and convey signals between host
and guest. A IMPDEF system register (SYS_ASYNC_PF_EL1) is introduced.
The register accepts control block's physical address, plus requested
features. Also, the signal is sent using data abort with the specific
IMPDEF Data Fault Status Code (DFSC). The specific signal is stored
in the control block by host, to be consumed by guest.

Todo
====
* CONFIG_KVM_ASYNC_PF_SYNC is disabled for now because the exception
   injection can't work in nested mode. It might be something to be
   improved in future.
* KVM_ASYNC_PF_SEND_ALWAYS is disabled even with CONFIG_PREEMPTION
   because it's simply not working reliably.
* Tracepoints, which should something to be done in short term.
* kvm-unit-test cases.
* More testing and debugging are needed. Sometimes, the guest can be
   stuck and the root cause needs to be figured out.

Let me add another few things:

- KVM/arm is (supposed to be) an architectural hypervisor. It means
   that one of the design goal is to have as few differences as possible
   from the actual hardware. I'm not keen on deviating from it (next
   thing you know, you'll add all the PV horror from Xen, HV, VMware...).

- The idea of butchering the arm64 mm subsystem to handle a new exotic
   style of exceptions is not something I am looking forward to. We
   might as well PV the whole MMU, Xen style, and be done with it. I'll
   let the arm64 maintainers comment on this though.


Thanks for your comments. The feature won't be enabled on guest side until
CONFIG_KVM_GUEST is enabled. More details can be found from PATCH[7/7]. So
it would be one specific features supported by KVM. I'm not familiar with
xen and would like to learn how MMU is para-virtualized there. Do you have
documents recommended to start with? Otherwise, I will try google later.

- We don't add IMPDEF sysregs, period. That's reserved for the HW. If
   you want to trap, there's the HVC instruction to that effect.


Yes, HVC can be used for trapping as PV stolen time did. However, I guess
it's guarded by specification? For example, the para-virtualized time calls
are specified by DEN0057A, as highlighted in include/linux/arm-smccc.h.

/* Paravirtualised time calls (defined by ARM DEN0057A) */
#define ARM_SMCCC_HV_PV_TIME_FEATURES                           \
        ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,                 \
                           ARM_SMCCC_SMC_64,                    \
                           ARM_SMCCC_OWNER_STANDARD_HYP,        \
                           0x20)

#define ARM_SMCCC_HV_PV_TIME_ST                                 \
        ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,                 \
                           ARM_SMCCC_SMC_64,                    \
                           ARM_SMCCC_OWNER_STANDARD_HYP,        \
                           0x21)

I really don't understand how IMPDEF sysreg is used by hardware vendors.
Do we have an existing functionality, which depends on IMPDEF sysreg?
I was thinking the IMPDEF sysreg can be used by software either, but
it seems I'm wrong.

- If this is such a great improvement, where are the performance
   numbers?


Yep, Ineed. I'm still looking for appropriate workload currently and hopefully,
I can share performance data in RFCv2 :)

- The fact that it apparently cannot work with nesting nor with
   preemption tends to indicate that it isn't future proof.


I didn't make myself clear about the nesting. The data abort exception is injected
by tweaking ELR_EL1/SPSR_EL1 if the guest is runing in 64-bits and EL1 mode. These
registers are loaded when the guest gets chance to run. However, it's impossible to
inject two (nested) data abort exception at once. It's something different from nested
VM.

There was a hot discusson about the preemption support. It's something in the TODO list
and needs to be sorted out in future.

https://lore.kernel.org/patchwork/patch/1206121/


Thanks,

	M.


Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux