Re: [PATCH 0/6] Support Asynchronous Page Fault

Gavin Shan <gshan@xxxxxxxxxx> · Mon, 2 Nov 2020 16:23:37 +1100

Hi James,

On 10/24/20 3:54 AM, James Morse wrote:
I think this series would benefit from being in smaller pieces. I got lost in patch 4 for
quite a while. Suggestion on where to do that in patch 4.

Yes, I will split the patches into small ones for easier review
in next revision. Thanks for your comments :)

On 18/08/2020 02:13, Gavin Shan wrote:
There are two stages of page fault. The guest kernel is responsible
for handling stage one page fault, while the host kernel is to take
care of the stage two page fault. When page fault is triggered because
of stage two page fault, the guest is suspended until the requested
memory (page) is populated. Sometimes, the cost to populate the requested
page isn't cheap and can take hundreds of milliseconds in extreme
cases. This impacts the overall guest's performance.

You really need to use postcopy live migration to justify these changes. Otherwise the
story here is "over-commited hosts suck", which I don't think anyone cares about.

Yes, I will use live migration as the justification in next revision :)

This series introduces the feature (asynchronous page fault) to resolve
the issue and improve the guest's performance. It depends on the series
to support SDEI virtualization and refactoring SDEI client driver.

SDEI gives you an NMI ... which you use to set a TIF flag. This can only work reliably for
user-space. So much so that you have code in the hypervisor to only deliver the NMI ...
when in user-space.
The only reason you would need an NMI is to interrupt interrupts-masked code. Linux can't
reschedule when this is the case.

I can only conclude, you really don't need an NMI here.

Why couldn't we use an IRQ here, it would be a lot simpler? ... the reason is the arm
architecture can't guarantee us that we take the irq when there is also a stage2 fault for
the first instruction.
I reckon we can work around this in the hypervisor:
https://lore.kernel.org/r/20201023165108.15061-1-james.morse@xxxxxxx

My problem with SDEI is, you don't really need an NMI, and it creates extra in-kernel
state that has to be migrated. I think having this state in the kernel complicates the
user-space handling of SIGBUS_MCEERR_AO signals that don't get delivered to a vCPU thread.

Currently, the asynchronous page fault is only supported for memory access in
guest's userspace, but we needn't to be sticky to the use model in future. It
means the asynchornous page fault could be supported for memory access in guest's
kernel space where the interrupt can be disabled or masked. So NMI is needed and
SDEI fits the use model very well as Paolo replied in another thread.

About the feature to support SDEI virtualization, I thought there might be some
use cases where the emulated devices need inject SDEI event to guest. However,
I'm not too much familiar with the architecture yet. If it's required by the
emulated devices, there are more more justifications to merge the code. However,
the implementation itself isn't simple and I would say it's complicated.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm