On 11/30/2010 07:58 AM, Avi Kivity wrote:
On 11/30/2010 03:47 PM, Anthony Liguori wrote:
On 11/30/2010 01:15 AM, Paolo Bonzini wrote:
On 11/30/2010 03:11 AM, Anthony Liguori wrote:
BufferedFile should hit the qemu_file_rate_limit check when the socket
buffer gets filled up.
The problem is that the file rate limit is not hit because work is
done elsewhere. The rate can limit the bandwidth used and makes
QEMU aware that socket operations may block (because that's what the
buffered file freeze/unfreeze logic does); but it cannot be used to
limit the _time_ spent in the migration code.
Yes, it can, if you set the rate limit sufficiently low.
The caveats are 1) the kvm.ko interface for dirty bits doesn't scale
for large memory guests so we spend a lot more CPU time walking it
than we should 2) zero pages cause us to burn a lot more CPU time
than we otherwise would because compressing them is so effective.
What's the problem with burning that cpu? per guest page, compressing
takes less than sending. Is it just an issue of qemu mutex hold time?
If you have a 512GB guest, then you have a 16MB dirty bitmap which ends
up being an 128MB dirty bitmap in QEMU because we represent dirty bits
with 8 bits.
Walking 16mb (or 128mb) of memory just fine find a few pages to send
over the wire is a big waste of CPU time. If kvm.ko used a multi-level
table to represent dirty info, we could walk the memory mapping at 2MB
chunks allowing us to skip a large amount of the comparisons.
In the short term, fixing (2) by accounting zero pages as full sized
pages should "fix" the problem.
In the long term, we need a new dirty bit interface from kvm.ko that
uses a multi-level table. That should dramatically improve scan
performance.
Why would a multi-level table help? (or rather, please explain what
you mean by a multi-level table).
Something we could do is divide memory into more slots, and polling
each slot when we start to scan its page range. That reduces the time
between sampling a page's dirtiness and sending it off, and reduces
the latency incurred by the sampling. There are also
non-interface-changing ways to reduce this latency, like O(1) write
protection, or using dirty bits instead of write protection when
available.
BTW, we should also refactor qemu to use the kvm dirty bitmap directly
instead of mapping it to the main dirty bitmap.
We also need to implement live migration in a separate thread that
doesn't carry qemu_mutex while it runs.
IMO that's the biggest hit currently.
Yup. That's the Correct solution to the problem.
Regards,
Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html