Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus

Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> · Sat, 7 Jan 2023 22:54:24 +0530

On 26/12/22 3:37 pm, Marc Zyngier wrote:
On Sun, 25 Dec 2022 16:50:04 +0000,
Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> wrote:

On 08/12/22 1:00 pm, Shivam Kumar wrote:

On 08/12/22 1:23 am, Sean Christopherson wrote:
On Wed, Dec 07, 2022, Marc Zyngier wrote:
On Tue, 06 Dec 2022 06:22:45 +0000,
Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> wrote:
You need to define the granularity of the counter, and account for
each fault according to its mapping size. If an architecture has 16kB
as the base page size, a 32MB fault (the size of the smallest block
mapping) must bump the counter by 2048. That's the only way userspace
can figure out what is going on.

I don't think that's true for the dirty logging case.  IIUC, when a
memslot is
being dirty logged, KVM forces the memory to be mapped with
PAGE_SIZE granularity,
and that base PAGE_SIZE is fixed and known to userspace.
I.e. accuracy is naturally
provided for this primary use case where accuracy really matters,
and so this is
effectively a documentation issue and not a functional issue.

So, does defining "count" as "the number of write permission faults"
help in addressing the documentation issue? My understanding too is
that for dirty logging, we will have uniform granularity.

Thanks.

Without that, you may as well add a random number to the counter, it
won't be any worse.

The stat will be wildly inaccurate when dirty logging isn't
enabled, but that doesn't
necessarily make the stat useless, e.g. it might be useful as a
very rough guage
of which vCPUs are likely to be writing memory.  I do agree though
that the value
provided is questionable and/or highly speculative.

[...]

If you introduce additional #ifdefery here, why are the additional
fields in the vcpu structure unconditional?

pages_dirtied can be a useful information even if dirty quota
throttling is not used. So, I kept it unconditional based on
feedback.

Useful for whom? This creates an ABI for all architectures, and this
needs buy-in from everyone. Personally, I think it is a pretty useless
stat.

When we started this patch series, it was a member of the kvm_run
struct. I made this a stat based on the feedback I received from the
reviews. If you think otherwise, I can move it back to where it was.

I'm certainly totally opposed to stats that don't have a clear use
case. People keep piling random stats that satisfy their pet usage,
and this only bloats the various structures for no overall benefit
other than "hey, it might be useful". This is death by a thousand cut.

I don't have a strong opinion on putting the counter into kvm_run
as an "out"
fields vs. making it a state.  I originally suggested making it a
stat because
KVM needs to capture the information somewhere, so why not make it
a stat?  But
I am definitely much more cavalier when it comes to adding stats,
so I've no
objection to dropping the stat side of things.

I'll be skeptical about making it a stat if we plan to allow the
userspace to reset it at will.

Thank you so much for the comments.

Thanks,
Shivam

Hi Marc,
Hi Sean,

Please let me know if there's any further question or feedback.

My earlier comments still stand: the proposed API is not usable as a
general purpose memory-tracking API because it counts faults instead
of memory, making it inadequate except for the most trivial cases.
And I cannot believe you were serious when you mentioned that you were
happy to make that the API.

This requires some serious work, and this series is not yet near a
state where it could be merged.

Thanks,

	M.

Hi Marc,

IIUC, in the dirty ring interface too, the dirty_index variable is 
incremented in the mark_page_dirty_in_slot function and it is also 
count-based. At least on x86, I am aware that for dirty tracking we have 
uniform granularity as huge pages (2MB pages) too are broken into 4K 
pages and bitmap is at 4K-granularity. Please let me know if it is 
possible to have multiple page sizes even during dirty logging on ARM. 
And if that is the case, I am wondering how we handle the bitmap with 
different page sizes on ARM.

I agree that the notion of pages dirtied according to our pages_dirtied 
variable depends on how we are handling the bitmap but we expect the 
userspace to use the same granularity at which the dirty bitmap is 
handled. I can capture this in documentation

CC: Peter Xu

Thanks,
Shivam