> Adding Marc, Oliver and kvmarm@xxxxxxxxxxxxxxx > > I tried to make the feature available to ARM64 long time ago, but the > efforts were discontinued as the significant concern was no users > demanding for it [1]. > It's definitely exciting news to know it's a important feature to AWS. I > guess it's probably another chance to re-evaluate the feature for ARM64? > > [1] https://lore.kernel.org/kvmarm/87iloq2oke.wl-maz@xxxxxxxxxx/ > > Async PF needs two signals sent from host to guest, SDEI (Software > Delegated Exception Interface) is leveraged for that. So there were two > series to support SDEI virtualization [1] and Async PF on ARM64 [2]. > > [1] https://lore.kernel.org/kvmarm/20220527080253.1562538-1- > gshan@xxxxxxxxxx/ > [2] https://lore.kernel.org/kvmarm/20210815005947.83699-1- > gshan@xxxxxxxxxx/ Thanks for all the information! This might become useful in the future, when we'll enable this feature on ARM, given the improvements we saw in x86. > > I got several questions for Mancini to answer, helpful understand the > situation better. > > - VM shapshot is stored somewhere remotely. It means the page fault on > instruction fetch becomes expensive. Do we have benchmarks how much > benefits brought by Async PF on x86 in AWS environment? In our small local repro (only local disk access) which runs a Java load after resume of the Firecracker VM, we saw a 20% performance regression (from ~80ms to ~100ms) and the time spent outside the VM due to EPT_VIOLATION increased 3x from 30ms to 90ms. This impact is amplified when access is not local. > > - I'm wandering if the data can be fetched from somewhere remotely in AWS > environment? Without getting into details, yes, any memory page could be remotely accessed in the worst case. > > - The data can be stored in local DRAM or swapping space, the page fault > to fetch data becomes expensive if the data is stored in swapping > space. > I'm not sure if it's possible the data resides in the swapping space in > AWS environment? Note that the swapping space, corresponding to disk, > could be somewhere remotely seated. In our usage, during resumption almost all pages are missing and are populated on demand with a userfaultfd, either from a local cache (memory or disk) or from the network. Thanks, Riccardo > > Thanks, > Gavin >