Re: [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops().

Anthony Liguori <aliguori@xxxxxxxxxxxxxxxxxx> · Mon, 03 May 2010 07:05:29 -0500

On 05/03/2010 04:32 AM, Yoshiaki Tamura wrote:
2010/4/23 Avi Kivity<avi@xxxxxxxxxx>:

On 04/23/2010 04:22 PM, Anthony Liguori wrote:

I currently don't have data, but I'll prepare it.
There were two things I wanted to avoid.

1. Pages to be copied to QEMUFile buf through qemu_put_buffer.
2. Calling write() everytime even when we want to send multiple pages at
once.

I think 2 may be neglectable.
But 1 seems to be problematic if we want make to the latency as small as
possible, no?

Copying often has strange CPU characteristics depending on whether the
data is already in cache.  It's better to drive these sort of optimizations
through performance measurement because changes are not always obvious.

Copying always introduces more cache pollution, so even if the data is in
the cache, it is worthwhile (not disagreeing with the need to measure).

Anthony,

I measure how long it takes to send all guest pages during migration, and I
would like to share the information in this message.  For convenience,
I modified
the code to do migration not "live migration" which means buffered file is not
used here.

In summary, the performance improvement using writev instead of write/send when
we used GbE seems to be neglectable, however, when the underlying network was
fast (InfiniBand with IPoIB in this case), writev performed 17% faster than
write/send, and therefore, it may be worthwhile to introduce vectors.

Since QEMU compresses pages, I copied a junk file to tmpfs to dirty pages to let
QEMU to transfer fine number of pages.  After setting up the guest, I used
cpu_get_real_ticks() to measure the time during the while loop calling
ram_save_block() in ram_save_live().  I removed the qemu_file_rate_limit() to
disable the function of buffered file, and all of the pages would be transfered
at the first round.

I measure 10 times for each, and took average and standard deviation.
Considering the results, I think the trial number was enough.  In addition to
time duration, number of writev/write and number of pages which were compressed
(dup)/not compressed (nodup) are demonstrated.

Test Environment:
CPU: 2x Intel Xeon Dual Core 3GHz
Mem size: 6GB
Network: GbE, InfiniBand (IPoIB)

Host OS: Fedora 11 (kernel 2.6.34-rc1)
Guest OS: Fedora 11 (kernel 2.6.33)
Guest Mem size: 512MB

* GbE writev
time (sec): 35.732 (std 0.002)
write count: 4 (std 0)
writev count: 8269 (std 1)
dup count: 36157 (std 124)
nodup count: 1016808 (std 147)

* GbE write
time (sec): 35.780 (std 0.164)
write count: 127367 (21)
writev count: 0 (std 0)
dup count: 36134 (std 108)
nodup count: 1016853 (std 165)

* IPoIB writev
time (sec): 13.889 (std 0.155)
write count: 4 (std 0)
writev count: 8267 (std 1)
dup count: 36147 (std 105)
nodup count: 1016838 (std 111)

* IPoIB write
time (sec): 16.777 (std 0.239)
write count: 127364 (24)
writev count: 0 (std 0)
dup count: 36173 (std 169)
nodup count: 1016840 (std 190)

Although the improvement wasn't obvious when the network wan GbE, introducing
writev may be worthwhile when we focus on faster networks like InfiniBand/10GE.

I agree that separating this optimization from the main logic of Kemari since
this modification must be done widely and carefully at the same time.

Okay.  It looks like it's clear that it's a win so let's split it out of 
the main series and we'll treat it separately.  I imagine we'll see even 
more positive results on 10 gbit and particularly if we move migration 
out into a separate thread.

Regards,

Anthony Liguori

Thanks,

Yoshi

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html