Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus

Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> · Sun, 12 Feb 2023 23:26:51 +0530

On 30/01/23 3:30 am, Shivam Kumar wrote:


On 15/01/23 3:26 pm, Marc Zyngier wrote:
On Sat, 14 Jan 2023 13:07:44 +0000,
Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> wrote:



On 08/01/23 3:14 am, Marc Zyngier wrote:
On Sat, 07 Jan 2023 17:24:24 +0000,
Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> wrote:
On 26/12/22 3:37 pm, Marc Zyngier wrote:
On Sun, 25 Dec 2022 16:50:04 +0000,
Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> wrote:

Hi Marc,
Hi Sean,

Please let me know if there's any further question or feedback.

My earlier comments still stand: the proposed API is not usable as a
general purpose memory-tracking API because it counts faults instead
of memory, making it inadequate except for the most trivial cases.
And I cannot believe you were serious when you mentioned that you 
were
happy to make that the API.

This requires some serious work, and this series is not yet near a
state where it could be merged.

Thanks,

    M.


Hi Marc,

IIUC, in the dirty ring interface too, the dirty_index variable is
incremented in the mark_page_dirty_in_slot function and it is also
count-based. At least on x86, I am aware that for dirty tracking we
have uniform granularity as huge pages (2MB pages) too are broken into
4K pages and bitmap is at 4K-granularity. Please let me know if it is
possible to have multiple page sizes even during dirty logging on
ARM. And if that is the case, I am wondering how we handle the bitmap
with different page sizes on ARM.

Easy. It *is* page-size, by the very definition of the API which
explicitly says that a single bit represent one basic page. If you
were to only break 1GB mappings into 2MB blocks, you'd have to mask
512 pages dirty at once, no question asked.

Your API is different because at no point it implies any relationship
with any page size. As it stands, it is a useless API. I understand
that you are only concerned with your particular use case, but that's
nowhere good enough. And it has nothing to do with ARM. This is
equally broken on *any* architecture.

I agree that the notion of pages dirtied according to our
pages_dirtied variable depends on how we are handling the bitmap but
we expect the userspace to use the same granularity at which the dirty
bitmap is handled. I can capture this in documentation

But what does the bitmap have to do with any of this? This is not what
your API is about. You are supposed to count dirtied memory, and you
are counting page faults instead. No sane userspace can make any sense
of that. You keep coupling the two, but that's wrong. This thing has
to be useful on its own, not just for your particular, super narrow
use case. And that's a shame because the general idea of a dirty quota
is an interesting one.

If your sole intention is to capture in the documentation that the API
is broken, then all I can do is to NAK the whole thing. Until you turn
this page-fault quota into the dirty memory quota that you advertise,
I'll continue to say no to it.

Thanks,

    M.


Thank you Marc for the suggestion. We can make dirty quota count
dirtied memory rather than faults.

run->dirty_quota -= page_size;

We can raise a kvm request for exiting to userspace as soon as the
dirty quota of the vcpu becomes zero or negative. Please let me know
if this looks good to you.

It really depends what "page_size" represents here. If you mean
"mapping size", then yes. If you really mean "page size", then no.

Assuming this is indeed "mapping size", then it all depends on how
this is integrated and how this is managed in a generic, cross
architecture way.

Thanks,

    M.


Hi Marc,

I'm proposing this new implementation to address the concern you raised 
regarding dirty quota being a non-generic feature with the previous 
implementation. This implementation decouples dirty quota from dirty 
logging for the ARM64 arch. We shall post a similar implementation for 
x86 if this looks good. With this new implementation, dirty quota can be 
enforced independent of dirty logging. Dirty quota is now in bytes and 
is decreased at write-protect page fault by page fault granularity. For 
userspace, the interface is unchanged, i.e. the dirty quota can be set 
from userspace via an ioctl or by forcing the vcpu to exit to userspace; 
userspace can expect a KVM exit with exit reason 
KVM_EXIT_DIRTY_QUOTA_EXHAUSTED when the dirty quota is exhausted.

Please let me know if it looks good to you. Happy to hear any further
feedback and work on it. Also, I am curious about use case scenarios 
other than dirty tracking for dirty quota. Besides, I am not aware of 
any interface exposed to the userspace, other than the dirty 
tracking-related ioctls, to write-protect guest pages transiently 
(unlike mprotect, which will generate a SIGSEGV signal on write).

Thanks,
Shivam


---
  arch/arm64/kvm/mmu.c     |  1 +
  include/linux/kvm_host.h |  1 +
  virt/kvm/kvm_main.c      | 12 ++++++++++++
  3 files changed, 14 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..edd88529d622 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1336,6 +1336,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
          /* Mark the page dirty only if the fault is handled 
successfully */
          if (writable && !ret) {
                  kvm_set_pfn_dirty(pfn);
+               update_dirty_quota(kvm, fault_granule);
                  mark_page_dirty_in_slot(kvm, memslot, gfn);
          }

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0b9b5c251a04..10fda457ac3d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1219,6 +1219,7 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm 
*kvm, gfn_t gfn);
  bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
  bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
  unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn);
+void update_dirty_quota(struct kvm *kvm, unsigned long 
dirty_granule_bytes);
  void mark_page_dirty_in_slot(struct kvm *kvm, const struct 
kvm_memory_slot *memslot, gfn_t gfn);
  void mark_page_dirty(struct kvm *kvm, gfn_t gfn);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7a54438b4d49..377cc9d07e80 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3309,6 +3309,18 @@ static bool 
kvm_vcpu_is_dirty_quota_exhausted(struct kvm_vcpu *vcpu)
  #endif
  }

+void update_dirty_quota(struct kvm *kvm, unsigned long 
dirty_granule_bytes)
+{
+    if (kvm->dirty_quota_enabled) {
+        struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+
+           if (!vcpu)
+                   return;
+
+           vcpu->run->dirty_quota_bytes -= dirty_granule_bytes;
+           if (vcpu->run->dirty_quota_bytes <= 0)
+                           kvm_make_request(KVM_REQ_DIRTY_QUOTA_EXIT, 
vcpu);
+    }
+}
+
  void mark_page_dirty_in_slot(struct kvm *kvm,
                               const struct kvm_memory_slot *memslot,
                         gfn_t gfn)


Hi Marc,

Please review the above code.

Thanks,
Shivam