On 22/06/2015 02:09, Paul Mackerras wrote: > On Wed, Jun 17, 2015 at 07:30:09PM +0200, Laurent Vivier wrote: >> >> Tested-by: Laurent Vivier <lvivier@xxxxxxxxxx> >> >> Performance is better, but Paul could you explain why it is better if I disable dynamic micro-threading ? >> Did I miss something ? >> >> My test system is an IBM Power S822L. >> >> I run two guests with 8 vCPUs (-smp 8,sockets=8,cores=1,threads=1) both >> attached on the same core (with pinning option of virt-manager). Then, I >> measure the time needed to compile a kernel in parallel in both guests >> with "make -j 16". >> >> My kernel without micro-threading: >> >> real 37m23.424s real 37m24.959s >> user 167m31.474s user 165m44.142s >> sys 113m26.195s sys 113m45.072s >> >> With micro-threading patches (PATCH 1+2): >> >> target_smt_mode 0 [in fact It was 8 here, but it should behave like 0, as it is > max threads/sub-core] >> dynamic_mt_modes 6 >> >> real 32m13.338s real 32m26.652s >> user 139m21.181s user 140m20.994s >> sys 77m35.339s sys 78m16.599s >> >> It's better, but if I disable dynamic micro-threading (but PATCH 1+2): >> >> target_smt_mode 0 >> dynamic_mt_modes 0 >> >> real 30m49.100s real 30m48.161s >> user 144m22.989s user 142m53.886s >> sys 65m4.942s sys 66m8.159s >> >> it's even better. > > I think what's happening here is that with dynamic_mt_modes=0 the > system alternates between the two guests, whereas with > dynamic_mt_modes=6 it will spend some of the time running both guests > simultaneously in two-way split mode. Since you have two > compute-bound guests that each have threads=1 and 8 vcpus, it can fill > up the core either way. In that case it is more efficient to fill up > the core with vcpus from one guest and not have to split the core, > firstly because you avoid the split/unsplit latency and secondly > because the threads run a little faster in whole-core mode than in > split-core. Thank you for the explanation. So it has more sense to have vCPUs with threads ? It seems: I did same tests with 4 vCPUs x 2 threads x 2 guests concurrently on one 8 threaded bare metal core. target_smt_mode 0 dynamic_mt_modes 0 real 35m33.142s real 35m44.967s user 167m16.971s user 163m14.320s sys 84m19.618s sys 90m38.978s target_smt_mode 0 dynamic_mt_modes 6 real 26m41.993s real 25m54.270s user 130m31.362s user 127m55.145s sys 58m17.378s sys 55m10.202s In this case, it really improves performance. Laurent -- To unsubscribe from this list: send the line "unsubscribe kvm" in