Re: [PATCH 3/5] KVM: Conditionally reschedule when resetting the dirty ring

Yan Zhao <yan.y.zhao@xxxxxxxxx> · Tue, 14 Jan 2025 15:58:34 +0800



On Mon, Jan 13, 2025 at 08:28:06AM -0800, Sean Christopherson wrote:
> On Mon, Jan 13, 2025, Yan Zhao wrote:
> > On Fri, Jan 10, 2025 at 05:04:07PM -0800, Sean Christopherson wrote:
> > > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> > > index a81ad17d5eef..37eb2b7142bd 100644
> > > --- a/virt/kvm/dirty_ring.c
> > > +++ b/virt/kvm/dirty_ring.c
> > > @@ -133,6 +133,16 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> > >  
> > >  		ring->reset_index++;
> > >  		(*nr_entries_reset)++;
> > > +
> > > +		/*
> > > +		 * While the size of each ring is fixed, it's possible for the
> > > +		 * ring to be constantly re-dirtied/harvested while the reset
> > > +		 * is in-progress (the hard limit exists only to guard against
> > > +		 * wrapping the count into negative space).
> > > +		 */
> > > +		if (!first_round)
> > > +			cond_resched();
> > > +
> > Will cond_resched() per entry be too frequent?
> 
> No, if it is too frequent, KVM has other problems.  cond_resched() only takes a
> handful of cycles when no work needs to be done, and on PREEMPTION=y kernels,
> dropping mmu_lock in kvm_reset_dirty_gfn() already includes a NEED_RESCHED check.
Ok. I just worried about the live migration performance.
But looks per-entry should be also good.

> 
> > Could we combine the cond_resched() per ring? e.g.
> > 
> > if (count >= ring->soft_limit)
> > 	cond_resched();
> > 
> > or simply
> > while (count < ring->size) {
> > 	...
> > }
> 
> I don't think I have any objections to bounding the reset at ring->size?  I
> assumed the unbounded walk was deliberate, e.g. to let userspace reset entries
> in a separate thread, but looking at the QEMU code, that doesn't appear to be
> the case.
Ok.

> However, IMO that's an orthogonal discussion.  I think KVM should still check for
> NEED_RESCHED after processing each entry regardless of how the loop is bounded.
> E.g. write-protecting 65536 GFNs is definitely going to have measurable latency.
Yes.