Re: [PATCH 09/10] Exit loop if we have been there too long

Anthony Liguori <anthony@xxxxxxxxxxxxx> · Tue, 30 Nov 2010 08:50:20 -0600

On 11/30/2010 08:27 AM, Avi Kivity wrote:
On 11/30/2010 04:17 PM, Anthony Liguori wrote:
What's the problem with burning that cpu?  per guest page, 
compressing takes less than sending.  Is it just an issue of qemu 
mutex hold time?

If you have a 512GB guest, then you have a 16MB dirty bitmap which 
ends up being an 128MB dirty bitmap in QEMU because we represent 
dirty bits with 8 bits.

Was there not a patchset to split each bit into its own bitmap?  And 
then copy the kvm or qemu master bitmap into each client bitmap as it 
became needed?

Walking 16mb (or 128mb) of memory just fine find a few pages to send 
over the wire is a big waste of CPU time.  If kvm.ko used a 
multi-level table to represent dirty info, we could walk the memory 
mapping at 2MB chunks allowing us to skip a large amount of the 
comparisons.

There's no reason to assume dirty pages would be clustered.  If 0.2% 
of memory were dirty, but scattered uniformly, there would be no win 
from the two-level bitmap.  A loss, in fact: 2MB can be represented as 
512 bits or 64 bytes, just one cache line.  Any two-level thing will 
need more.

We might have a more compact encoding for sparse bitmaps, like 
run-length encoding.

In the short term, fixing (2) by accounting zero pages as full 
sized pages should "fix" the problem.

In the long term, we need a new dirty bit interface from kvm.ko 
that uses a multi-level table.  That should dramatically improve 
scan performance. 

Why would a multi-level table help?  (or rather, please explain what 
you mean by a multi-level table).

Something we could do is divide memory into more slots, and polling 
each slot when we start to scan its page range.  That reduces the 
time between sampling a page's dirtiness and sending it off, and 
reduces the latency incurred by the sampling.  There are also 
non-interface-changing ways to reduce this latency, like O(1) write 
protection, or using dirty bits instead of write protection when 
available.

BTW, we should also refactor qemu to use the kvm dirty bitmap 
directly instead of mapping it to the main dirty bitmap.

That's what the patch set I was alluding to did.  Or maybe I imagined 
the whole thing.

No, it just split the main bitmap into three bitmaps.  I'm suggesting 
that we have the dirty interface have two implementations, one that 
refers to the 8-bit bitmap when TCG in use and another one that uses the 
KVM representation.

TCG really needs multiple dirty bits but KVM doesn't.  A shared 
implementation really can't be optimal.

We also need to implement live migration in a separate thread that 
doesn't carry qemu_mutex while it runs.

IMO that's the biggest hit currently.

Yup.  That's the Correct solution to the problem.

Then let's just Do it.

Yup.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html