Re: [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/18/2010 07:15 AM, Takuya Yoshikawa wrote:
Lai Jiangshan once tried to rewrite kvm_mmu_slot_remove_write_access() using
rmap: "kvm: rework remove-write-access for a slot"
       http://www.spinics.net/lists/kvm/msg35871.html

One problem pointed out there was that this approach might hurt cache locality
and make things slow down.

But if we restrict the story to dirty logging, we notice that only small
portion of pages are actually needed to be write protected.

For example, I have confirmed that even when we are playing with tools like
x11perf, dirty ratio of the frame buffer bitmap is almost always less than 10%.

In the case of live-migration, we will see more sparseness in the usual
workload because the RAM size is really big.

So this patch uses his approach with small modification to use switched out
dirty bitmap as a hint to restrict the rmap travel.

We can also use this to selectively write protect pages to reduce unwanted page
faults in the future.


Looks like a good approach.  Any measurements?


+static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp)
+{
+	u64 *spte = rmap_next(kvm, rmapp, NULL);
+
+	while (spte) {
+		/* avoid RMW */
+		if (is_writable_pte(*spte))
+			*spte&= ~PT_WRITABLE_MASK;

This is racy, *spte can be modified concurrently by hardware.

update_spte() can be used for this.

+		spte = rmap_next(kvm, rmapp, spte);
+	}
+}
+
+/*
+ * Write protect the pages set dirty in a given bitmap.
+ */
+void kvm_mmu_slot_remove_write_access_mask(struct kvm *kvm,
+					   struct kvm_memory_slot *memslot,
+					   unsigned long *dirty_bitmap)
+{
+	int i;
+	unsigned long gfn_offset;
+
+	for_each_set_bit(gfn_offset, dirty_bitmap, memslot->npages) {
+		rmapp_remove_write_access(kvm,&memslot->rmap[gfn_offset]);
+
+		for (i = 0; i<  KVM_NR_PAGE_SIZES - 1; i++) {
+			unsigned long gfn = memslot->base_gfn + gfn_offset;
+			unsigned long huge = KVM_PAGES_PER_HPAGE(i + 2);
+			int idx = gfn / huge - memslot->base_gfn / huge;

Better to use a shift than a division here.

+
+			if (!(gfn_offset || (gfn % huge)))
+				break;

Why?

+			rmapp_remove_write_access(kvm,
+					&memslot->lpage_info[i][idx].rmap_pde);
+		}
+	}
+	kvm_flush_remote_tlbs(kvm);
+}
+
  void kvm_mmu_zap_all(struct kvm *kvm)
  {
  	struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 038d719..3556b4d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3194,12 +3194,27 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
  }

  /*
+ * Check the dirty bit ratio of a given memslot.
+ *   0: clean
+ *   1: sparse
+ *   2: dense
+ */
+static int dirty_bitmap_density(struct kvm_memory_slot *memslot)
+{
+	if (!memslot->num_dirty_bits)
+		return 0;
+	if (memslot->num_dirty_bits<  memslot->npages / 128)
+		return 1;
+	return 2;
+}

Use an enum please.

+
+/*
   * Get (and clear) the dirty memory log for a memory slot.
   */
  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
  				      struct kvm_dirty_log *log)
  {
-	int r;
+	int r, density;
  	struct kvm_memory_slot *memslot;
  	unsigned long n;

@@ -3217,7 +3232,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
  	n = kvm_dirty_bitmap_bytes(memslot);

  	/* If nothing is dirty, don't bother messing with page tables. */
-	if (memslot->num_dirty_bits) {
+	density = dirty_bitmap_density(memslot);
+	if (density) {
  		struct kvm_memslots *slots, *old_slots;
  		unsigned long *dirty_bitmap;

@@ -3242,7 +3258,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
  		kfree(old_slots);

  		spin_lock(&kvm->mmu_lock);
-		kvm_mmu_slot_remove_write_access(kvm, log->slot);
+		if (density == 2)
+			kvm_mmu_slot_remove_write_access(kvm, log->slot);
+		else
+			kvm_mmu_slot_remove_write_access_mask(kvm,
+						&slots->memslots[log->slot],
+						dirty_bitmap);
  		spin_unlock(&kvm->mmu_lock);

wrt. O(1) write protection: hard to tell if the two methods can coexist. For direct mapped shadow pages (i.e. ep/npt) I think we can use the mask to speed up clearing of an individual sp's sptes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux