Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap

Marc Zyngier <maz@xxxxxxxxxx> · Tue, 25 Oct 2022 08:22:32 +0100

On Tue, 25 Oct 2022 00:50:29 +0100,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> 
> On Sat, Oct 22, 2022, Marc Zyngier wrote:
> > On Fri, 21 Oct 2022 17:05:26 +0100, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > 
> > > On Fri, Oct 21, 2022, Marc Zyngier wrote:
> > > > Because dirtying memory outside of a vcpu context makes it
> > > > incredibly awkward to handle a "ring full" condition?
> > > 
> > > Kicking all vCPUs with the soft-full request isn't _that_ awkward.
> > > It's certainly sub-optimal, but if inserting into the per-VM ring is
> > > relatively rare, then in practice it's unlikely to impact guest
> > > performance.
> > 
> > But there is *nothing* to kick here. The kernel is dirtying pages,
> > devices are dirtying pages (DMA), and there is no context associated
> > with that. Which is why a finite ring is the wrong abstraction.
> 
> I don't follow.  If there's a VM, KVM can always kick all vCPUs.
> Again, might be far from optimal, but it's an option.  If there's
> literally no VM, then KVM isn't involved at all and there's no "ring
> vs. bitmap" decision.

The key word is *device*. No vcpu is involved here. Actually, we
actively prevent save/restore of the ITS while vcpus are running. How
could you even expect to snapshot a consistent state if the interrupt
state is changing under your feet?

> 
> > > Would it be possible to require a dirty bitmap when an ITS is
> > > created?  That would allow treating the above condition as a KVM
> > > bug.
> > 
> > No. This should be optional. Everything about migration should be
> > absolutely optional (I run plenty of concurrent VMs on sub-2GB
> > systems). You want to migrate a VM that has an ITS or will collect
> > dirty bits originating from a SMMU with HTTU, you enable the dirty
> > bitmap. You want to have *vcpu* based dirty rings, you enable them.
> > 
> > In short, there shouldn't be any reason for the two are either
> > mandatory or conflated. Both should be optional, independent, because
> > they cover completely disjoined use cases. *userspace* should be in
> > charge of deciding this.
> 
> I agree about userspace being in control, what I want to avoid is
> letting userspace put KVM into a bad state without any indication
> from KVM that the setup is wrong until something actually dirties a
> page.

I can't see how that can result in a bad state for KVM itself. All you
lack is a way for userspace to *track* the dirtying. Just like we
don't have a way to track the dirtying of a page from the VMM.

> Specifically, if mark_page_dirty_in_slot() is invoked without a
> running vCPU, on a memslot with dirty tracking enabled but without a
> dirty bitmap, then the migration is doomed.

Yup, and that's a luser error. Too bad. Userspace can still transfer
all the memory, and all will be fine.

> Dropping the dirty page isn't a sane response as that'd all but
> guaranatee memory corruption in the guest.

Again, user error. Userspace can readily write over all the guest
memory (virtio), and no amount of KVM-side tracking will help. What
are you going to do about it?

At the end of the day, what are you trying to do? All the dirty
tracking muck (bitmap and ring) is only a way for userspace to track
dirty pages more easily and accelerate the transfer. If userspace
doesn't tell KVM to track these writes, tough luck. If the author of a
VMM doesn't understand that, then maybe they shouldn't be in charge of
the VMM. Worse case, they can still transfer the whole thing, no harm
done.

> At best, KVM could kick all vCPUs out to userspace
> with a new exit reason, but that's not a very good experience for
> userspace as either the VM is unexpectedly unmigratable or the VM
> ends up being killed (or I suppose userspace could treat the exit as
> a per-VM dirty ring of size '1').

Can we please stop the exit nonsense? There is no vcpu involved
here. This is a device (emulated or not) writing to memory, triggered
by an ioctl from userspace. If you're thinking vcpu, you have the
wrong end of the stick.

Memory gets dirtied system wide, not just by CPUs, and no amount of
per-vcpu resource is going to solve this problem. VM-based rings can
help with if they provide a way to recover from an overflow. But that
obviously doesn't work here as we can't checkpoint and restart the
saving process on overflow.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm