Re: [PATCH 00/13] KVM: MMU: fast page fault

Takuya Yoshikawa <takuya.yoshikawa@xxxxxxxxx> · Tue, 17 Apr 2012 00:49:35 +0900

On Sun, 15 Apr 2012 12:32:59 +0300
Avi Kivity <avi@xxxxxxxxxx> wrote:

> Just to throw another idea into the mix - we can have write-protect-less
> dirty logging, too.  Instead of write protection, drop the dirty bit,
> and check it again when reading the dirty log.  It might look like we're
> accessing the spte twice here, but it's actually just once - when we
> check it to report for GET_DIRTY_LOG call N, we also prepare it for call
> N+1.

kvm-ppc's dirty tracking, implemented by Paul, is using information supplied
by hardware and seems to be similar to what you described here.

We may be able to get feedback from kvm-ppc developers.

> This doesn't work for EPT, which lacks a dirty bit.  But we can emulate
> it: take a free bit and call it spte.NOTDIRTY, when it is set, we also
> clear spte.WRITE, and teach the mmu that if it sees spte.NOTDIRTY and
> can just set spte.WRITE and clear spte.NOTDIRTY.  Now that looks exactly
> like Xiao's lockless write enabling.

How do we sync with dirty_bitmap?

> Another note: O(1) write protection is not mutually exclusive with rmap
> based write protection.  In GET_DIRTY_LOG, you write protect everything,
> and proceed to write enable on faults.  When you reach the page table
> level, you perform the rmap check to see if you should write protect or
> not.  With role.direct=1 the check is very cheap (and sometimes you can
> drop the entire page table and replace it with a large spte).

I understand that there are many possible combinations.

But the question is whether the complexity is really worth it.

Once, when we were searching a way to find atomic bitmap switch, you said
to me that we should do our best not to add overheads to VCPU threads.

>From then, I tried my best to mitigate the latency problem without adding
code to VCPU thread paths: if we add cond_resched patch, we will get a simple
solution to the current known problem -- probably 64GB guests will work well
without big latencies, once QEMU gets improved.

	I also surveyed other known hypervisors internally.  We can easily see
	hundreds of ms latency during migration.  But people rarely complain
	about that if they are stable and usable in most situations.

Although O(1) is actually O(1) for GET_DIRTY_LOG thread, it adds some
overheads to page fault handling.  We may need to hold mmu_lock for properly
handling O(1)'s write protection and ~500 write protections will not be so
cheap.  And there is no answer to the question how to achive slot-wise write
protection.

Of course, we may need such a tree-wide write protection when we want to
support guests with hundreds of GB, or TB, of memory.  Sadly it's not now.

Well, if you need the best answer now, we should discuss the whole design:
KVM Forum may be a good place for that.

Thanks,
	Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html