On Wed, Jun 17, 2015 at 07:30:09PM +0200, Laurent Vivier wrote: > > Tested-by: Laurent Vivier <lvivier@xxxxxxxxxx> > > Performance is better, but Paul could you explain why it is better if I disable dynamic micro-threading ? > Did I miss something ? > > My test system is an IBM Power S822L. > > I run two guests with 8 vCPUs (-smp 8,sockets=8,cores=1,threads=1) both > attached on the same core (with pinning option of virt-manager). Then, I > measure the time needed to compile a kernel in parallel in both guests > with "make -j 16". > > My kernel without micro-threading: > > real 37m23.424s real 37m24.959s > user 167m31.474s user 165m44.142s > sys 113m26.195s sys 113m45.072s > > With micro-threading patches (PATCH 1+2): > > target_smt_mode 0 [in fact It was 8 here, but it should behave like 0, as it is > max threads/sub-core] > dynamic_mt_modes 6 > > real 32m13.338s real 32m26.652s > user 139m21.181s user 140m20.994s > sys 77m35.339s sys 78m16.599s > > It's better, but if I disable dynamic micro-threading (but PATCH 1+2): > > target_smt_mode 0 > dynamic_mt_modes 0 > > real 30m49.100s real 30m48.161s > user 144m22.989s user 142m53.886s > sys 65m4.942s sys 66m8.159s > > it's even better. I think what's happening here is that with dynamic_mt_modes=0 the system alternates between the two guests, whereas with dynamic_mt_modes=6 it will spend some of the time running both guests simultaneously in two-way split mode. Since you have two compute-bound guests that each have threads=1 and 8 vcpus, it can fill up the core either way. In that case it is more efficient to fill up the core with vcpus from one guest and not have to split the core, firstly because you avoid the split/unsplit latency and secondly because the threads run a little faster in whole-core mode than in split-core. I am considering adding an additional heuristic, which would be to do two passes through the list of preempted vcores, considering only vcores from the same guest as the primary vcore on the first pass, and then considering all vcores on the second pass. Maybe we could then also say after the first pass that if we have collected 4 or more runnable vcpus we don't bother with the second pass. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in