Re: [RFC PATCH 05/20] Introduce put_vector() and get_vector to QEMUFile and qemu_fopen_ops().

Yoshiaki Tamura <tamura.yoshiaki@xxxxxxxxxxxxx> · Mon, 3 May 2010 18:32:53 +0900

2010/4/23 Avi Kivity <avi@xxxxxxxxxx>:
> On 04/23/2010 04:22 PM, Anthony Liguori wrote:
>>>
>>> I currently don't have data, but I'll prepare it.
>>> There were two things I wanted to avoid.
>>>
>>> 1. Pages to be copied to QEMUFile buf through qemu_put_buffer.
>>> 2. Calling write() everytime even when we want to send multiple pages at
>>> once.
>>>
>>> I think 2 may be neglectable.
>>> But 1 seems to be problematic if we want make to the latency as small as
>>> possible, no?
>>
>>
>> Copying often has strange CPU characteristics depending on whether the
>> data is already in cache.  It's better to drive these sort of optimizations
>> through performance measurement because changes are not always obvious.
>
> Copying always introduces more cache pollution, so even if the data is in
> the cache, it is worthwhile (not disagreeing with the need to measure).

Anthony,

I measure how long it takes to send all guest pages during migration, and I
would like to share the information in this message.  For convenience,
I modified
the code to do migration not "live migration" which means buffered file is not
used here.

In summary, the performance improvement using writev instead of write/send when
we used GbE seems to be neglectable, however, when the underlying network was
fast (InfiniBand with IPoIB in this case), writev performed 17% faster than
write/send, and therefore, it may be worthwhile to introduce vectors.

Since QEMU compresses pages, I copied a junk file to tmpfs to dirty pages to let
QEMU to transfer fine number of pages.  After setting up the guest, I used
cpu_get_real_ticks() to measure the time during the while loop calling
ram_save_block() in ram_save_live().  I removed the qemu_file_rate_limit() to
disable the function of buffered file, and all of the pages would be transfered
at the first round.

I measure 10 times for each, and took average and standard deviation.
Considering the results, I think the trial number was enough.  In addition to
time duration, number of writev/write and number of pages which were compressed
(dup)/not compressed (nodup) are demonstrated.

Test Environment:
CPU: 2x Intel Xeon Dual Core 3GHz
Mem size: 6GB
Network: GbE, InfiniBand (IPoIB)

Host OS: Fedora 11 (kernel 2.6.34-rc1)
Guest OS: Fedora 11 (kernel 2.6.33)
Guest Mem size: 512MB

* GbE writev
time (sec): 35.732 (std 0.002)
write count: 4 (std 0)
writev count: 8269 (std 1)
dup count: 36157 (std 124)
nodup count: 1016808 (std 147)

* GbE write
time (sec): 35.780 (std 0.164)
write count: 127367 (21)
writev count: 0 (std 0)
dup count: 36134 (std 108)
nodup count: 1016853 (std 165)

* IPoIB writev
time (sec): 13.889 (std 0.155)
write count: 4 (std 0)
writev count: 8267 (std 1)
dup count: 36147 (std 105)
nodup count: 1016838 (std 111)

* IPoIB write
time (sec): 16.777 (std 0.239)
write count: 127364 (24)
writev count: 0 (std 0)
dup count: 36173 (std 169)
nodup count: 1016840 (std 190)

Although the improvement wasn't obvious when the network wan GbE, introducing
writev may be worthwhile when we focus on faster networks like InfiniBand/10GE.

I agree that separating this optimization from the main logic of Kemari since
this modification must be done widely and carefully at the same time.

Thanks,

Yoshi
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html