Re: [PATCH 09/10] Exit loop if we have been there too long

Juan Quintela <quintela@xxxxxxxxxx> · Tue, 30 Nov 2010 18:59:07 +0100

Anthony Liguori <anthony@xxxxxxxxxxxxx> wrote:
> On 11/30/2010 08:12 AM, Paolo Bonzini wrote:
>> On 11/30/2010 02:47 PM, Anthony Liguori wrote:
>>> On 11/30/2010 01:15 AM, Paolo Bonzini wrote:

> Juan's patch, IIUC, does the following: If you've been iterating in a
> tight loop, return to the main loop for *one* iteration every 50ms.

I dont' think it that way, more to the lines that: if you have spent
50ms, time to leave others to run O:-)  But yes, this is one of the changes.

> But this means that during this 50ms period of time, a VCPU may be
> blocked from running.  If the guest isn't doing a lot of device I/O
> *and* you're on a relatively low link speed, then this will mean that
> you don't hold qemu_mutex for more than 50ms at a time.
>
> But in the degenerate case where you have a high speed link and you
> have a guest doing a lot of device I/O, you'll see the guest VCPU
> being blocked for 50ms, then getting to run for a very brief period of
> time, followed by another block for 50ms.  The guest's execution will
> be extremely sporadic.

ok, lets go back a bit.  we have two problems:
- vcpu stalls
- main_loop stalls

my patch reduce stalls to a maximun of 50ms, just now they can be much
bigger.  Fully agree that doing ram_save_live without qemu_mutex is
an improvement.  But we are still hogging teh main loop, so we need my
patch and splitting qemu_mutex.

In other words, splitting qemu_mutex will fix half of the problem, not
all of it.

> This isn't fixable with this approach.  The only way to really fix
> this is to say that over a given period of time, migration may only
> consume XX amount of CPU time which guarantees the VCPUs get the
> qemu_mutex for the rest of the time.
>
> This is exactly what rate limiting does.  Yes, it results in a longer
> migration time but that's the trade-off we have to make if we want
> deterministic VCPU execution until we can implement threading
> properly.

I just give up here the discussion.  Basically I abused one thing (a
timeout), that was trivial to understand: if we have spent too much
time, leave others to run.

It is not a perfect solution, but it is a big improvement over what we
had.

I have been suggested that I use the other appoarch, and that I abuse
qemu_file_rate_limit().  creating a new qemu_file_write_nop() that only
increases xfer bytes transmited.  And then trust the rate limiting code
to fix the problem.  How this is nicer/clearer is completely alien to
me.  That probably also reduces the stalls, perhaps (I have to look at
the interactions), but it is a complete lie, we are counting as
trasfering stuff, stuff that we are not transfering :(

Why I don't like it?  Because what I found when I started is that this
rate_limit works very badly if we are not able to run the buffered_file
limit each 100ms.  If we lost ticks of that timer, our calculations are
wrong.

> If you want a simple example, doing I/O with the rtl8139 adapter while
> doing your migration test and run a tight loop in the get running
> gettimeofday().  Graph the results to see how much execution time the
> guest is actually getting.

I am still convinced that we need to limit the ammount of time that an
io_handler can use.

And that we should write scare things to the logs when an io_handler
needs too much time.

There are two independent problems.  I fully agree that having
ram_save_live() without qemu_mutex is an improvement.  But it is an
improvement independently of my patch.

>>> In the long term, we need a new dirty bit interface from kvm.ko that
>>> uses a multi-level table. That should dramatically improve scan
>>> performance. We also need to implement live migration in a separate
>>> thread that doesn't carry qemu_mutex while it runs.
>>
>> This may be a good way to fix it, but it's also basically a rewrite.
>
> The only correct short term solution I can see if rate limiting
> unfortunately.

I still think that this is an inferior solution, we are using something
to limit the amount of stuff that we write to the network to limit the
amount of pages that we walk.

Later, Juan.

> Regards,
>
> Anthony Liguori
>
>> Paolo
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html