On 11/30/2010 08:12 AM, Paolo Bonzini wrote:
On 11/30/2010 02:47 PM, Anthony Liguori wrote:
On 11/30/2010 01:15 AM, Paolo Bonzini wrote:
On 11/30/2010 03:11 AM, Anthony Liguori wrote:
BufferedFile should hit the qemu_file_rate_limit check when the socket
buffer gets filled up.
The problem is that the file rate limit is not hit because work is
done elsewhere. The rate can limit the bandwidth used and makes QEMU
aware that socket operations may block (because that's what the
buffered file freeze/unfreeze logic does); but it cannot be used to
limit the _time_ spent in the migration code.
Yes, it can, if you set the rate limit sufficiently low.
You mean, just like you can drive a car without brakes by keeping the
speed sufficiently low.
[..] accounting zero pages as full sized
pages should "fix" the problem.
I know you used quotes, but it's a very very generous definition of
fix. Both these proposed "fixes" are nothing more than workarounds,
and even particularly ugly ones. The worst thing about them is that
there is no guarantee of migration finishing in a reasonable time, or
at all.
If you account zero pages as full, you don't use effectively the
bandwidth that was allotted to you, you use only 0.2% of it (8/4096).
It then takes an exaggerate amount of time to start iteration on pages
that matter. If you set the bandwidth low, instead, you do not have
the bandwidth you need in order to converge.
Even from an aesthetic point of view, if there is such a thing, I
don't understand why you advocate conflating network bandwidth and CPU
usage into a single measurement. Nobody disagrees that all you
propose is nice to have, and that what Juan sent is a stopgap measure
(though a very effective one). However, this doesn't negate that
Juan's accounting patches make a lot of sense in the current design.
Juan's patch, IIUC, does the following: If you've been iterating in a
tight loop, return to the main loop for *one* iteration every 50ms.
But this means that during this 50ms period of time, a VCPU may be
blocked from running. If the guest isn't doing a lot of device I/O
*and* you're on a relatively low link speed, then this will mean that
you don't hold qemu_mutex for more than 50ms at a time.
But in the degenerate case where you have a high speed link and you have
a guest doing a lot of device I/O, you'll see the guest VCPU being
blocked for 50ms, then getting to run for a very brief period of time,
followed by another block for 50ms. The guest's execution will be
extremely sporadic.
This isn't fixable with this approach. The only way to really fix this
is to say that over a given period of time, migration may only consume
XX amount of CPU time which guarantees the VCPUs get the qemu_mutex for
the rest of the time.
This is exactly what rate limiting does. Yes, it results in a longer
migration time but that's the trade-off we have to make if we want
deterministic VCPU execution until we can implement threading properly.
If you want a simple example, doing I/O with the rtl8139 adapter while
doing your migration test and run a tight loop in the get running
gettimeofday(). Graph the results to see how much execution time the
guest is actually getting.
In the long term, we need a new dirty bit interface from kvm.ko that
uses a multi-level table. That should dramatically improve scan
performance. We also need to implement live migration in a separate
thread that doesn't carry qemu_mutex while it runs.
This may be a good way to fix it, but it's also basically a rewrite.
The only correct short term solution I can see if rate limiting
unfortunately.
Regards,
Anthony Liguori
Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html