On Wed, 02 Nov 2022 16:11:07 +0000, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Wed, Nov 02, 2022, Marc Zyngier wrote: > > On Wed, 02 Nov 2022 14:29:26 +0000, Peter Xu <peterx@xxxxxxxxxx> wrote: > > > However I don't see anything stops a simple "race" to trigger like below: > > > > > > recycle thread vcpu thread > > > -------------- ----------- > > > if (!dirty_ring_soft_full) <--- not full > > > dirty_ring_push(); > > > if (dirty_ring_soft_full) <--- full due to the push > > > set_request(SOFT_FULL); > > > clear_request(SOFT_FULL); <--- can wrongly clear the request? > > > > > > > Hmmm, well spotted. That's another ugly effect of the recycle thread > > playing with someone else's toys. > > > > > But I don't think that's a huge matter, as it'll just let the vcpu to have > > > one more chance to do another round of KVM_RUN. Normally I think it means > > > there can be one more dirty GFN (perhaps there're cases that it can push >1 > > > gfns for one KVM_RUN cycle? I never figured out the details here, but > > > still..) pushed to the ring so closer to the hard limit, but we have had a > > > buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries. So I assume > > > that's still fine, but maybe worth a short comment here? > > > > > > I never know what's the maximum possible GFNs being dirtied for a KVM_RUN > > > cycle. It would be good if there's an answer to that from anyone. > > > > This is dangerous, and I'd rather not go there. > > > > It is starting to look like we need the recycle thread to get out of > > the way. And to be honest: > > > > + if (!kvm_dirty_ring_soft_full(ring)) > > + kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu); > > > > seems rather superfluous. Only clearing the flag in the vcpu entry > > path feels much saner, and I can't see anything that would break. > > > > Thoughts? > > I've no objections to dropping the clear on reset, I suggested it > primarily so that it would be easier to understand what action > causes the dirty ring to become not-full. I agree that the explicit > clear is unnecessary from a functional perspective. The core of the issue is that the whole request mechanism is a producer/consumer model, where consuming a request is a CLEAR action. The standard model is that the vcpu thread is the consumer, and that any thread (including the vcpu itself) can be a producer. With this flag clearing being on a non-vcpu thread, you end-up with two consumers, and things can go subtly wrong. I'd suggest replacing this hunk with a comment saying that the request will be cleared by the vcpu thread next time it enters the guest. M. -- Without deviation from the norm, progress is not possible.