Re: [PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core

Laurent Vivier <lvivier@xxxxxxxxxx> · Mon, 22 Jun 2015 12:37:07 +0200

On 22/06/2015 02:09, Paul Mackerras wrote:
> On Wed, Jun 17, 2015 at 07:30:09PM +0200, Laurent Vivier wrote:
>>
>> Tested-by: Laurent Vivier <lvivier@xxxxxxxxxx>
>>
>> Performance is better, but Paul could you explain why it is better if I disable dynamic micro-threading ?
>> Did I miss something ?
>>
>> My test system is an IBM Power S822L.
>>
>> I run two guests with 8 vCPUs (-smp 8,sockets=8,cores=1,threads=1) both
>> attached on the same core (with pinning option of virt-manager). Then, I
>> measure the time needed to compile a kernel in parallel in both guests
>> with "make -j 16".
>>
>> My kernel without micro-threading:
>>
>> real    37m23.424s                 real    37m24.959s
>> user    167m31.474s                user    165m44.142s
>> sys     113m26.195s                sys     113m45.072s
>>
>> With micro-threading patches (PATCH 1+2):
>>
>> target_smt_mode 0 [in fact It was 8 here, but it should behave like 0, as it is > max threads/sub-core]
>> dynamic_mt_modes 6
>>
>> real    32m13.338s                 real  32m26.652s
>> user    139m21.181s                user  140m20.994s
>> sys     77m35.339s                 sys   78m16.599s
>>
>> It's better, but if I disable dynamic micro-threading (but PATCH 1+2):
>>
>> target_smt_mode 0
>> dynamic_mt_modes 0
>>
>> real    30m49.100s                 real 30m48.161s
>> user    144m22.989s                user 142m53.886s
>> sys     65m4.942s                  sys  66m8.159s
>>
>> it's even better.
> 
> I think what's happening here is that with dynamic_mt_modes=0 the
> system alternates between the two guests, whereas with
> dynamic_mt_modes=6 it will spend some of the time running both guests
> simultaneously in two-way split mode.  Since you have two
> compute-bound guests that each have threads=1 and 8 vcpus, it can fill
> up the core either way.  In that case it is more efficient to fill up
> the core with vcpus from one guest and not have to split the core,
> firstly because you avoid the split/unsplit latency and secondly
> because the threads run a little faster in whole-core mode than in
> split-core.

Thank you for the explanation.

So it has more sense to have vCPUs with threads ?

It seems:

I did same tests with 4 vCPUs x 2 threads x 2 guests concurrently on one
8 threaded bare metal core.

target_smt_mode 0
dynamic_mt_modes 0

real    35m33.142s		real    35m44.967s
user    167m16.971s		user    163m14.320s
sys     84m19.618s		sys     90m38.978s

target_smt_mode 0
dynamic_mt_modes 6

real    26m41.993s		real    25m54.270s
user    130m31.362s		user    127m55.145s
sys     58m17.378s		sys     55m10.202s	

In this case, it really improves performance.

Laurent
--
To unsubscribe from this list: send the line "unsubscribe kvm" in