Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 21 Oct 2022 16:05:26 +0000

On Fri, Oct 21, 2022, Marc Zyngier wrote:
> On Fri, 21 Oct 2022 00:44:51 +0100,
> Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > 
> > On Tue, Oct 11, 2022, Gavin Shan wrote:
> > > Some architectures (such as arm64) need to dirty memory outside of the
> > > context of a vCPU. Of course, this simply doesn't fit with the UAPI of
> > > KVM's per-vCPU dirty ring.
> > 
> > What is the point of using the dirty ring in this case?  KVM still
> > burns a pile of memory for the bitmap.  Is the benefit that
> > userspace can get away with scanning the bitmap fewer times,
> > e.g. scan it once just before blackout under the assumption that
> > very few pages will dirty the bitmap?
> 
> Apparently, the throttling effect of the ring makes it easier to
> converge. Someone who actually uses the feature should be able to
> tell you. But that's a policy decision, and I don't see why we should
> be prescriptive.

I wasn't suggesting we be prescriptive, it was an honest question.

> > Why not add a global ring to @kvm?  I assume thread safety is a
> > problem, but the memory overhead of the dirty_bitmap also seems like
> > a fairly big problem.
> 
> Because we already have a stupidly bloated API surface, and that we
> could do without yet another one based on a sample of *one*?

But we're adding a new API regardless.  A per-VM ring would definitely be a bigger
addition, but if using the dirty_bitmap won't actually meet the needs of userspace,
then we'll have added a new API and still not have solved the problem.  That's why
I was asking why/when userspace would want to use dirty_ring+dirty_bitmap.

> Because dirtying memory outside of a vcpu context makes it incredibly awkward
> to handle a "ring full" condition?

Kicking all vCPUs with the soft-full request isn't _that_ awkward.  It's certainly
sub-optimal, but if inserting into the per-VM ring is relatively rare, then in
practice it's unlikely to impact guest performance.

> > > Introduce a new flavor of dirty ring that requires the use of both vCPU
> > > dirty rings and a dirty bitmap. The expectation is that for non-vCPU
> > > sources of dirty memory (such as the GIC ITS on arm64), KVM writes to
> > > the dirty bitmap. Userspace should scan the dirty bitmap before
> > > migrating the VM to the target.
> > > 
> > > Use an additional capability to advertize this behavior and require
> > > explicit opt-in to avoid breaking the existing dirty ring ABI. And yes,
> > > you can use this with your preferred flavor of DIRTY_RING[_ACQ_REL]. Do
> > > not allow userspace to enable dirty ring if it hasn't also enabled the
> > > ring && bitmap capability, as a VM is likely DOA without the pages
> > > marked in the bitmap.
> 
> This is wrong. The *only* case this is useful is when there is an
> in-kernel producer of data outside of the context of a vcpu, which is
> so far only the ITS save mechanism. No ITS? No need for this.

How large is the ITS?  If it's a fixed, small size, could we treat the ITS as a
one-off case for now?  E.g. do something gross like shove entries into vcpu0's
dirty ring?

> Userspace knows what it has created the first place, and should be in
> charge of it (i.e. I want to be able to migrate my GICv2 and
> GICv3-without-ITS VMs with the rings only).

Ah, so enabling the dirty bitmap isn't strictly required.  That means this patch
is wrong, and it also means that we need to figure out how we want to handle the
case where mark_page_dirty_in_slot() is invoked without a running vCPU on a memslot
without a dirty_bitmap.

I.e. what's an appropriate action in the below sequence:

void mark_page_dirty_in_slot(struct kvm *kvm,
			     const struct kvm_memory_slot *memslot,
		 	     gfn_t gfn)
{
	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();

#ifdef CONFIG_HAVE_KVM_DIRTY_RING
	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
		return;

#ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
	if (WARN_ON_ONCE(!vcpu))
		return;
#endif
#endif

	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
		unsigned long rel_gfn = gfn - memslot->base_gfn;
		u32 slot = (memslot->as_id << 16) | memslot->id;

		if (vcpu && kvm->dirty_ring_size)
			kvm_dirty_ring_push(&vcpu->dirty_ring,
					    slot, rel_gfn);
		else if (memslot->dirty_bitmap)
			set_bit_le(rel_gfn, memslot->dirty_bitmap);
		else
			???? <=================================================
	}
}

Would it be possible to require a dirty bitmap when an ITS is created?  That would
allow treating the above condition as a KVM bug.

> > > @@ -4499,6 +4507,11 @@ static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
> > >  {
> > >  	int r;
> > >  
> > > +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> > > +	if (!kvm->dirty_ring_with_bitmap)
> > > +		return -EINVAL;
> > > +#endif
> > 
> > This one at least is prettier with IS_ENABLED
> > 
> > 	if (IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) &&
> > 	    !kvm->dirty_ring_with_bitmap)
> > 		return -EINVAL;
> > 
> > But dirty_ring_with_bitmap really shouldn't need to exist.  It's
> > mandatory for architectures that have
> > HAVE_KVM_DIRTY_RING_WITH_BITMAP, and unsupported for architectures
> > that don't.  In other words, the API for enabling the dirty ring is
> > a bit ugly.
> > 
> > Rather than add KVM_CAP_DIRTY_LOG_RING_ACQ_REL, which hasn't been
> > officially released yet, and then KVM_CAP_DIRTY_LOG_ING_WITH_BITMAP
> > on top, what about usurping bits 63:32 of cap->args[0] for flags?
> > E.g.

For posterity, filling in my missing idea...

Since the size is restricted to be well below a 32-bit value, and it's unlikely
that KVM will ever support 4GiB per-vCPU rings, we could usurp the upper bits for
flags:

  static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u64 arg0)
  {
	u32 flags = arg0 >> 32;
	u32 size = arg0;

However, since it sounds like enabling dirty_bitmap isn't strictly required, I
have no objection to enabling KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, my objection
was purely that KVM was adding a per-VM flag just to sanity check the configuration.

> > Ideally we'd use cap->flags directly, but we screwed up with
> > KVM_CAP_DIRTY_LOG_RING and didn't require flags to be zero :-(
> >
> > Actually, what's the point of allowing
> > KVM_CAP_DIRTY_LOG_RING_ACQ_REL to be enabled?  I get why KVM would
> > enumerate this info, i.e. allowing checking, but I don't seen any
> > value in supporting a second method for enabling the dirty ring.
> > 
> > The acquire-release thing is irrelevant for x86, and no other
> > architecture supports the dirty ring until this series, i.e. there's
> > no need for KVM to detect that userspace has been updated to gain
> > acquire-release semantics, because the fact that userspace is
> > enabling the dirty ring on arm64 means userspace has been updated.
> 
> Do we really need to make the API more awkward? There is an
> established pattern of "enable what is advertised". Some level of
> uniformity wouldn't hurt, really.

I agree that uniformity would be nice, but for capabilities I don't think that's
ever going to happen.  I'm pretty sure supporting enabling is actually in the
minority.  E.g. of the 20 capabilities handled in kvm_vm_ioctl_check_extension_generic(),
I believe only 3 support enabling (KVM_CAP_HALT_POLL, KVM_CAP_DIRTY_LOG_RING, and
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2).
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm