Re: [PATCH v1 1/5] KVM: arm64: Enable ring-based dirty memory tracking

Marc Zyngier <maz@xxxxxxxxxx> · Tue, 23 Aug 2022 21:35:27 +0100

On Tue, 23 Aug 2022 15:44:42 +0100,
Oliver Upton <oliver.upton@xxxxxxxxx> wrote:
> 
> On Mon, Aug 22, 2022 at 10:42:15PM +0100, Marc Zyngier wrote:
> > Hi Gavin,
> > 
> > On Mon, 22 Aug 2022 02:58:20 +0100,
> > Gavin Shan <gshan@xxxxxxxxxx> wrote:
> > > 
> > > Hi Marc,
> > > 
> > > On 8/19/22 6:00 PM, Marc Zyngier wrote:
> > > > On Fri, 19 Aug 2022 01:55:57 +0100,
> > > > Gavin Shan <gshan@xxxxxxxxxx> wrote:
> > > >> 
> > > >> The ring-based dirty memory tracking has been available and enabled
> > > >> on x86 for a while. The feature is beneficial when the number of
> > > >> dirty pages is small in a checkpointing system or live migration
> > > >> scenario. More details can be found from fb04a1eddb1a ("KVM: X86:
> > > >> Implement ring-based dirty memory tracking").
> > > >> 
> > > >> This enables the ring-based dirty memory tracking on ARM64. It's
> > > >> notable that no extra reserved ring entries are needed on ARM64
> > > >> because the huge pages are always split into base pages when page
> > > >> dirty tracking is enabled.
> > > > 
> > > > Can you please elaborate on this? Adding a per-CPU ring of course
> > > > results in extra memory allocation, so there must be a subtle
> > > > x86-specific detail that I'm not aware of...
> > > > 
> > > 
> > > Sure. I guess it's helpful to explain how it works in next revision.
> > > Something like below:
> > > 
> > > This enables the ring-based dirty memory tracking on ARM64. The feature
> > > is enabled by CONFIG_HAVE_KVM_DIRTY_RING, detected and enabled by
> > > CONFIG_HAVE_KVM_DIRTY_RING. A ring buffer is created on every vcpu and
> > > each entry is described by 'struct kvm_dirty_gfn'. The ring buffer is
> > > pushed by host when page becomes dirty and pulled by userspace. A vcpu
> > > exit is forced when the ring buffer becomes full. The ring buffers on
> > > all vcpus can be reset by ioctl command KVM_RESET_DIRTY_RINGS.
> > > 
> > > Yes, I think so. Adding a per-CPU ring results in extra memory allocation.
> > > However, it's avoiding synchronization among multiple vcpus when dirty
> > > pages happen on multiple vcpus. More discussion can be found from [1]
> > 
> > Oh, I totally buy the relaxation of the synchronisation (though I
> > doubt this will have any visible effect until we have something like
> > Oliver's patches to allow parallel faulting).
> > 
> 
> Heh, yeah I need to get that out the door. I'll also note that Gavin's
> changes are still relevant without that series, as we do write unprotect
> in parallel at PTE granularity after commit f783ef1c0e82 ("KVM: arm64:
> Add fast path to handle permission relaxation during dirty logging").

Ah, true. Now if only someone could explain how the whole
producer-consumer thing works without a trace of a barrier, that'd be
great...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.