Re: [PATCH 09/10] Exit loop if we have been there too long

Anthony Liguori <anthony@xxxxxxxxxxxxx> · Tue, 30 Nov 2010 09:00:09 -0600

On 11/30/2010 08:12 AM, Paolo Bonzini wrote:
On 11/30/2010 02:47 PM, Anthony Liguori wrote:
On 11/30/2010 01:15 AM, Paolo Bonzini wrote:
On 11/30/2010 03:11 AM, Anthony Liguori wrote:

BufferedFile should hit the qemu_file_rate_limit check when the socket
buffer gets filled up.

The problem is that the file rate limit is not hit because work is
done elsewhere. The rate can limit the bandwidth used and makes QEMU
aware that socket operations may block (because that's what the
buffered file freeze/unfreeze logic does); but it cannot be used to
limit the _time_ spent in the migration code.

Yes, it can, if you set the rate limit sufficiently low.

You mean, just like you can drive a car without brakes by keeping the 
speed sufficiently low.

[..] accounting zero pages as full sized
pages should "fix" the problem.

I know you used quotes, but it's a very very generous definition of 
fix.  Both these proposed "fixes" are nothing more than workarounds, 
and even particularly ugly ones.  The worst thing about them is that 
there is no guarantee of migration finishing in a reasonable time, or 
at all.

If you account zero pages as full, you don't use effectively the 
bandwidth that was allotted to you, you use only 0.2% of it (8/4096). 
It then takes an exaggerate amount of time to start iteration on pages 
that matter.  If you set the bandwidth low, instead, you do not have 
the bandwidth you need in order to converge.

Even from an aesthetic point of view, if there is such a thing, I 
don't understand why you advocate conflating network bandwidth and CPU 
usage into a single measurement.  Nobody disagrees that all you 
propose is nice to have, and that what Juan sent is a stopgap measure 
(though a very effective one).  However, this doesn't negate that 
Juan's accounting patches make a lot of sense in the current design.

Juan's patch, IIUC, does the following: If you've been iterating in a 
tight loop, return to the main loop for *one* iteration every 50ms.

But this means that during this 50ms period of time, a VCPU may be 
blocked from running.  If the guest isn't doing a lot of device I/O 
*and* you're on a relatively low link speed, then this will mean that 
you don't hold qemu_mutex for more than 50ms at a time.

But in the degenerate case where you have a high speed link and you have 
a guest doing a lot of device I/O, you'll see the guest VCPU being 
blocked for 50ms, then getting to run for a very brief period of time, 
followed by another block for 50ms.  The guest's execution will be 
extremely sporadic.

This isn't fixable with this approach.  The only way to really fix this 
is to say that over a given period of time, migration may only consume 
XX amount of CPU time which guarantees the VCPUs get the qemu_mutex for 
the rest of the time.

This is exactly what rate limiting does.  Yes, it results in a longer 
migration time but that's the trade-off we have to make if we want 
deterministic VCPU execution until we can implement threading properly.

If you want a simple example, doing I/O with the rtl8139 adapter while 
doing your migration test and run a tight loop in the get running 
gettimeofday().  Graph the results to see how much execution time the 
guest is actually getting.

In the long term, we need a new dirty bit interface from kvm.ko that
uses a multi-level table. That should dramatically improve scan
performance. We also need to implement live migration in a separate
thread that doesn't carry qemu_mutex while it runs.

This may be a good way to fix it, but it's also basically a rewrite.

The only correct short term solution I can see if rate limiting 
unfortunately.

Regards,

Anthony Liguori

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html