Re: [PATCH v2 1/1] KVM: Implement dirty quota-based throttling of vcpus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 11, 2022, Shivam Kumar wrote:
> On 10/01/22 11:38 pm, Sean Christopherson wrote:
> > > +			vcpu->run->exit_reason = KVM_EXIT_DIRTY_QUOTA_FULL;
> > "FULL" is a bit of a misnomer, there's nothing being filled.  REACHED is a better,
> > though not perfect because the quota can be exceeded if multiple pages are dirtied
> > in a single run.  Maybe just KVM_EXIT_DIRTY_QUOTA?
> 
> Absolutely. Does KVM_EXIT_DIRTY_QUOTA_EXHAUSTED look good? Or should I keep it just
> KVM_EXIT_DIRTY_QUOTA?

EXHAUSTED is good.

> > This stat should count regardless of whether dirty logging is enabled.  First and
> > foremost, counting only while dirty logging is enabled creates funky semantics for
> > KVM_EXIT_DIRTY_QUOTA, e.g. a vCPU can take exits even though no memslots have dirty
> > logging enabled (due to past counts), and a vCPU can dirty enormous amounts of
> > memory without exiting due to the relevant memslot not being dirty logged.
> > 
> > Second, the stat could be useful for determining initial quotas or just for debugging.
> > There's the huge caveat that the counts may be misleading if there's nothing clearing
> > the dirty bits, but I suspect the info would still be helpful.
> > 
> > Speaking of caveats, this needs documentation in Documentation/virt/kvm/api.rst.
> > One thing that absolutely needs to be covered is that this quota is not a hard limit,
> > and that it is enforced opportunistically, e.g. with VMX's PML enabled, a vCPU can go
> > up to 511 (or 510? I hate math) counts over its quota.
> 
> I think section 5 (kvm run structure) will be the right place to document this. Please
> confirm once.

Yep.

> > The "(if dirty quota throttling is enabled)" is stale, this is the enable.
> 
> dirty_quota will be reset to zero at the end of live migration. I added this
> to capture this scenario.

That's irrelevant with respect to KVM's responsibilities.  More below.

> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index f079820f52b5..7449b9748ddf 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -424,6 +424,20 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
> >   	return cmpxchg(&vcpu->mode, IN_GUEST_MODE, EXITING_GUEST_MODE);
> >   }
> > 
> > +static inline int kvm_vcpu_check_dirty_quota(struct kvm_vcpu *vcpu)
> > +{
> > +	u64 dirty_quota = READ_ONCE(vcpu->run->dirty_quota);
> 
> I think we can store vcpu->stat.generic.pages_dirtied too in some variable here, and
> use that in the next statements.

Yep.

> > @@ -508,6 +514,12 @@ struct kvm_run {
> >   		struct kvm_sync_regs regs;
> >   		char padding[SYNC_REGS_SIZE_BYTES];
> >   	} s;
> > +	/*
> > +	 * Number of pages the vCPU is allowed to have dirtied over its entire
> > +	 * liftime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA if the quota is
> > +	 * reached/exceeded.
> > +	 */
> > +	__u64 dirty_quota;
> >   };
> 
> As mentioned above, this doesn't capture resetting of dirty_quota. Please let
> me know if that can be ignored here and captured in the documentation.

Capture it in the documentation.  Similar to the "reset to zero at the end of
live migration" comment, that's the behavior of _one_ userspace VMM, and isn't
fully representative of KVM's responsibilities.  From a KVM perspective, we can't
make any assumptions about how exactly userspace will utilize a feature, what
matters is what is/isn't supported by KVM's ABI.  E.g. for this particular comment,
KVM needs to play nice with userspace setting dirty_quota to any arbitrary value,
and functionally at any time as well since the quota is checked inside the run
loop, hence suggestion to use READ_ONCE().

It's certainly helpful to document how a feature can be used, especially for users
and VMM developers, but that belongs in the "official" documentation, not internal
code comments.

Speaking of VMM behavior, this also needs a selftest.  I'm guessing it'll be easier
and cleaner to add a new test instead of enhancing something like dirty_log_test,
but whatever works.

> >   /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 168d0ab93c88..aa526b5b5518 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3163,7 +3163,12 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
> >   	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> >   		return;
> > 
> > -	if (memslot && kvm_slot_dirty_track_enabled(memslot)) {
> > +	if (!memslot)
> > +		return;
> > +
> > +	vcpu->stat.generic.pages_dirtied++;
> > +
> > +	if (kvm_slot_dirty_track_enabled(memslot)) {
> >   		unsigned long rel_gfn = gfn - memslot->base_gfn;
> >   		u32 slot = (memslot->as_id << 16) | memslot->id;
> > 
> > --
> > 
> This looks superb. I am very thankful to you for your reviews. As a beginner in linux
> kernel development, I have learnt a lot from your suggestions.
> 
> I'll be sending this with the documentation bit shortly. Please let me know if I can
> add you in the "Reviewed-by" list in the next patch.

No :-)  A Reviewed-by can be speculatively/conditionally provided, but generally
speaking that's only done if the expected delta is minimal and/or unlikely to have
any impact on the functionally.  That's also a good guideline for carrying reviews
across revisions of a patch: it's usually ok to retain a R-b if you rebase a patch,
tweak a comment/changelog, make minor revisions such as renaming a variable, etc...,
but you should drop a R-b (or explicitly ask as you've done here) if you make
non-trivial changes to a patch and/or modify the functionality in some way.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux