Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL

Marc Zyngier <maz@xxxxxxxxxx> · Wed, 02 Nov 2022 16:44:41 +0000

On Wed, 02 Nov 2022 16:11:07 +0000,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> 
> On Wed, Nov 02, 2022, Marc Zyngier wrote:
> > On Wed, 02 Nov 2022 14:29:26 +0000, Peter Xu <peterx@xxxxxxxxxx> wrote:
> > > However I don't see anything stops a simple "race" to trigger like below:
> > > 
> > >           recycle thread                   vcpu thread
> > >           --------------                   -----------
> > >       if (!dirty_ring_soft_full)                                   <--- not full
> > >                                         dirty_ring_push();
> > >                                         if (dirty_ring_soft_full)  <--- full due to the push
> > >                                             set_request(SOFT_FULL);
> > >           clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
> > >
> > 
> > Hmmm, well spotted. That's another ugly effect of the recycle thread
> > playing with someone else's toys.
> > 
> > > But I don't think that's a huge matter, as it'll just let the vcpu to have
> > > one more chance to do another round of KVM_RUN.  Normally I think it means
> > > there can be one more dirty GFN (perhaps there're cases that it can push >1
> > > gfns for one KVM_RUN cycle?  I never figured out the details here, but
> > > still..) pushed to the ring so closer to the hard limit, but we have had a
> > > buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
> > > that's still fine, but maybe worth a short comment here?
> > > 
> > > I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
> > > cycle.  It would be good if there's an answer to that from anyone.
> > 
> > This is dangerous, and I'd rather not go there.
> > 
> > It is starting to look like we need the recycle thread to get out of
> > the way. And to be honest:
> > 
> > +	if (!kvm_dirty_ring_soft_full(ring))
> > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > 
> > seems rather superfluous. Only clearing the flag in the vcpu entry
> > path feels much saner, and I can't see anything that would break.
> > 
> > Thoughts?
> 
> I've no objections to dropping the clear on reset, I suggested it
> primarily so that it would be easier to understand what action
> causes the dirty ring to become not-full.  I agree that the explicit
> clear is unnecessary from a functional perspective.

The core of the issue is that the whole request mechanism is a
producer/consumer model, where consuming a request is a CLEAR
action. The standard model is that the vcpu thread is the consumer,
and that any thread (including the vcpu itself) can be a producer.

With this flag clearing being on a non-vcpu thread, you end-up with
two consumers, and things can go subtly wrong.

I'd suggest replacing this hunk with a comment saying that the request
will be cleared by the vcpu thread next time it enters the guest.

	M.

-- 
Without deviation from the norm, progress is not possible.