Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus

Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> · Tue, 6 Dec 2022 11:52:45 +0530

On 22/11/22 11:16 pm, Marc Zyngier wrote:
On Fri, 18 Nov 2022 09:47:50 +0000,
Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> wrote:

On 18/11/22 12:56 am, Marc Zyngier wrote:
On Sun, 13 Nov 2022 17:05:06 +0000,
Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> wrote:

+    count: the current count of pages dirtied by the VCPU, can be
+    skewed based on the size of the pages accessed by each vCPU.

How can userspace make a decision on the amount of dirtying this
represent if this doesn't represent a number of base pages? Or are you
saying that this only counts the number of permission faults that have
dirtied pages?

Yes, this only counts the number of permission faults that have
dirtied pages.

So how can userspace consistently set a quota of dirtied memory? This
has to account for the size that has been faulted, because that's all
userspace can reason about. Remember that at least on arm64, we're
dealing with 3 different base page sizes, and many more large page
sizes.

I understand that this helps only when the majority of dirtying is 
happening for the same page size. In our use case (VM live migration), 
even large page is broken into 4k pages at first dirty. If required in 
future, we can add individual counters for different page sizes.

Thanks for pointing this out.

+    quota: the observed dirty quota just before the exit to
userspace.

You are defining the quota in terms of quota. -ENOCLUE.

I am defining the "quota" member of the dirty_quota_exit struct in
terms of "dirty quota" which is already defined in the commit
message.

Which nobody will see. This is supposed to be a self contained
documentation.
Ack. Thanks.

+The userspace can design a strategy to allocate the overall scope of dirtying
+for the VM among the vcpus. Based on the strategy and the current state of dirty
+quota throttling, the userspace can make a decision to either update (increase)
+the quota or to put the VCPU to sleep for some time.

This looks like something out of 1984 (Newspeak anyone)? Can't you
just say that userspace is responsible for allocating the quota and
manage the resulting throttling effect?

We didn't intend to sound like the Party or the Big Brother. We
started working on the linux and QEMU patches at the same time and got
tempted into exposing the details of how we were using this feature in
QEMU for throttling. I can get rid of the details if it helps.

I think the details are meaningless, and this should stick to the API,
not the way the API could be used.

Ack. Thanks.

+	/*
+	 * Number of pages the vCPU is allowed to have dirtied over its entire
+	 * lifetime.  KVM_RUN exits with KVM_EXIT_DIRTY_QUOTA_EXHAUSTED if the quota
+	 * is reached/exceeded.
+	 */
+	__u64 dirty_quota;

How are dirty_quota and dirty_quota_exit.quota related?

dirty_quota_exit.quota is the dirty quota at the time of the exit. We
are capturing it for userspace's reference because dirty quota can be
updated anytime.

Shouldn't that be described here?

Ack. Thanks.

+#endif

If you introduce additional #ifdefery here, why are the additional
fields in the vcpu structure unconditional?

pages_dirtied can be a useful information even if dirty quota
throttling is not used. So, I kept it unconditional based on
feedback.

Useful for whom? This creates an ABI for all architectures, and this
needs buy-in from everyone. Personally, I think it is a pretty useless
stat.

When we started this patch series, it was a member of the kvm_run 
struct. I made this a stat based on the feedback I received from the 
reviews. If you think otherwise, I can move it back to where it was.

Thanks.

And while we're talking about pages_dirtied, I really dislike the
WARN_ON in mark_page_dirty_in_slot(). A counter has rolled over?
Shock, horror...

Ack. I'll give it a thought but if you have any specific suggestion on 
how I can make it better, kindly let me know. Thanks.

CC: Sean

I can add #ifdefery in the vcpu run struct for dirty_quota.

   		else
   			set_bit_le(rel_gfn, memslot->dirty_bitmap);
+
+		if (kvm_vcpu_is_dirty_quota_exhausted(vcpu))
+			kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);

This is broken in the light of new dirty-tracking code queued for
6.2. Specifically, you absolutely can end-up here *without* a vcpu on
arm64. You just have to snapshot the ITS state to observe the fireworks.

Could you please point me to the patchset which is in queue?

The patches are in -next, and you can look at the branch here[1].
Please also refer to the discussion on the list, as a lot of what was
discussed there does apply here.

Thanks,

	M.

[1] https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_maz_arm-2Dplatforms.git_log_-3Fh-3Dkvm-2Darm64_dirty-2Dring&d=DwIBAg&c=s883GpUCOChKOHiocYtGcg&r=4hVFP4-J13xyn-OcN0apTCh8iKZRosf5OJTQePXBMB8&m=gyAAYSO2lIMhffCjshL9ZiA15_isV4kNauvn7aKEDy-kwpVrGNLlmO9AF6ilCsI1&s=x8gk31QIy9KHImR3z1xJOs9bSpKw1WYC_d1W-Vj5eTM&e=

Thank you so much for the information. I went through the patches and I 
feel an additional check in the "if" condition will help eliminate any 
possible issue.

if (vcpu && kvm_vcpu_is_dirty_quota_exhausted(vcpu))
	kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, vcpu);

Happy to know your thoughts.

Thank you so much Marc for the review.

Thanks,
Shivam