Re: [PATCH 5/8] kvm: Add cap/kvm_run field for memory fault exits

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 16 Feb 2023 13:38:39 -0800

On Thu, Feb 16, 2023, Anish Moorthy wrote:
> On Wed, Feb 15, 2023 at 12:59 AM Oliver Upton <oliver.upton@xxxxxxxxx> wrote:
> > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > > index 109b18e2789c4..9352e7f8480fb 100644
> > > > --- a/include/linux/kvm_host.h
> > > > +++ b/include/linux/kvm_host.h
> > > > @@ -801,6 +801,9 @@ struct kvm {
> > > >     bool vm_bugged;
> > > >     bool vm_dead;
> > > >
> > > > +   rwlock_t mem_fault_nowait_lock;
> > > > +   bool mem_fault_nowait;
> > >
> > > A full-fat rwlock to protect a single bool? What benefits do you
> > > expect from a rwlock? Why is it preferable to an atomic access, or a
> > > simple bitop?
> >
> > There's no need to have any kind off dedicated atomicity.  The only readers are
> > in vCPU context, just disallow KVM_CAP_MEM_FAULT_NOWAIT after vCPUs are created.
> 
> I think we do need atomicity here.

Atomicity, yes.  Mutually exclusivity, no.  AFAICT, nothing will break if userspace
has multiple in-flight calls to toggled the flag.  And if we do want to guarantee
there's only one writer, then kvm->lock or kvm->slots_lock will suffice.

> When KVM_CAP_MEM_FAULT_NOWAIT is enabled async page faults are essentially
> disabled: so userspace will likely want to disable the cap at some point
> (such as the end of live migration post-copy).

Ah, this is a dynamic thing and not a set-and-forget thing.

> Since we want to support this without having to pause vCPUs, there's an
> atomicity requirement.

Ensuring that vCPUs "see" the new value and not corrupting memory are two very
different things.  Making the flag an atomic, wrapping with a rwlock, etc... do
nothing to ensure vCPUs observe the new value.  And for non-crazy usage of bools,
they're not even necessary to avoid memory corruption, e.g. the result of concurrent
writes to a bool is non-deterministic, but so is the order of two tasks contending
for a lock, so it's a moot point.

I think what you really want to achieve is that vCPUs observe the NOWAIT flag
before KVM returns to userspace.  There are a variety of ways to make that happen,
but since this all about accessing guest memory, the simplest is likely to
"protect" the flag with kvm->srcu, i.e. require SRCU be held by readers and then
do a synchronize_srcu() to ensure all vCPUs have picked up the new value.

Speaking of SRCU (which protect memslots), why not make this a memslot flag?  If
the goal is to be able to turn the behavior on/off dynamically, wouldn't it be
beneficial to turn off the NOWAIT behavior when a memslot is fully transfered?

A memslot flag would likely be simpler to implement as it would piggyback all of
the existing infrastructure to handle memslot updates.