Re: [RFC PATCH v3 01/11] KVM: Capture VM start

Raghavendra Rao Ananta <rananta@xxxxxxxxxx> · Wed, 12 Jan 2022 10:08:32 -0800

On Tue, Jan 11, 2022 at 11:04 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Tue, Jan 11, 2022, Raghavendra Rao Ananta wrote:
> > On Tue, Jan 11, 2022 at 9:36 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > In your proposed patch, KVM_RUN will take kvm->lock _every_ time.  That introduces
> > > unnecessary contention as it will serialize this bit of code if multiple vCPUs
> > > are attempting KVM_RUN.  By checking !vm_started, only the "first" KVM_RUN for a
> > > VM will acquire kvm->lock and thus avoid contention once the VM is up and running.
> > > There's still a possibility that multiple vCPUs will contend for kvm->lock on their
> > > first KVM_RUN, hence the quotes.  I called it "naive" because it's possible there's
> > > a more elegant solution depending on the use case, e.g. a lockless approach might
> > > work (or it might not).
> > >
> > But is it safe to read kvm->vm_started without grabbing the lock in
> > the first place?
>
> Don't know, but that's my point.  Without a consumer in generic KVM and due to
> my lack of arm64 knowledge, without a high-level description of how the flag will
> be used by arm64, it's really difficult to determine what's safe and what's not.
> For other architectures, it's an impossible question to answer because we don't
> know how the flag might be used.
>
> > use atomic_t maybe for this?
>
> No.  An atomic_t is generally useful only if there are multiple writers that can
> possibly write different values.  It's highly unlikely that simply switching to an
> atomic address the needs of arm64.
>
> > > > > > +                     kvm->vm_started = true;
> > > > > > +                     mutex_unlock(&kvm->lock);
> > > > >
> > > > > Lastly, why is this in generic KVM?
> > > > >
> > > > The v1 of the series originally had it in the arm specific code.
> > > > However, I was suggested to move it to the generic code since the book
> > > > keeping is not arch specific and could be helpful to others too [1].
> > >
> > > I'm definitely in favor of moving/adding thing to generic KVM when it makes sense,
> > > but I'm skeptical in this particular case.  The code _is_ arch specific in that
> > > arm64 apparently needs to acquire kvm->lock when checking if a vCPU has run, e.g.
> > > versus a hypothetical x86 use case that might be completely ok with a lockless
> > > implementation.  And it's not obvious that there's a plausible, safe use case
> > > outside of arm64, e.g. on x86, there is very, very little that is truly shared
> > > across the entire VM/system, most things are per-thread/core/package in some way,
> > > shape, or form.  In other words, I'm a wary of providing something like this for
> > > x86 because odds are good that any use will be functionally incorrect.
> > I've been going back and forth on this. I've seen a couple of
> > variables declared in the generic struct and used only in the arch
> > code. vcpu->valid_wakeup for instance, which is used only by s390
> > arch. Maybe I'm looking at it the wrong way as to what can and can't
> > go in the generic kvm code.
>
> Ya, valid_wakeup is an oddball, I don't know why it's in kvm_vcpu instead of
> arch code that's wrapped with e.g. kvm_arch_vcpu_valid_wakeup().
>
> That said, valid_wakeup is consumed by generic KVM, i.e. has well defined semantics
> for how it is used, so it's purely a "this code is rather odd" issue.  vm_started
> on the other hand is only produced by generic KVM, and so its required semantics are
> unclear.

Understood. I'll move it to arm64 and we can refactor it if there's a
need for any other arch(s).

Thanks,
Raghavendra