On 16/10/19 18:50, Andrea Arcangeli wrote: >> It still doesn't add up. 0.3ms / 5 is 1/15000th of a second; 43us is >> 1/25000th of a second. Do you have multiple vCPU perhaps? > > Why would I run any test on UP guests? Rather then spending time doing > the math on my results, it's probably quicker that you run it yourself: I don't know, but if you don't say how many vCPUs you have, I cannot do the math and review the patch. >> The number of vmexits doesn't count (for HLT). What counts is how long >> they take to be serviced, and as long as it's 1us or more the >> optimization is pointless. > > Please note the single_task_running() check which immediately breaks > the kvm_vcpu_check_block() loop if there's even a single other task > that can be scheduled in the runqueue of the host CPU. > > What happen when the host is not idle is quoted below: > > w/o optimization with optimization > ---------------------- ------------------------- > 0us vmexit vmexit > 500ns retpoline call vmexit handler directly > 600ns retpoline kvm_vcpu_check_block() > 700ns retpoline schedule() > 800ns kvm_vcpu_check_block() > 900ns schedule() > ... > > Disclaimer: the numbers on the left are arbitrary and I just cut and > pasted them from yours, no idea how far off they are. Yes, of course. But the idea is the same: yes, because of the retpoline you run the guest for perhaps 300ns more before schedule()ing, but does that really matter? 300ns * 20000 times/second is a 0.6% performance impact, and 300ns is already very generous. I am not sure it would be measurable at all. Paolo > To be clear, I would find it very reasonable to be requested to proof > the benefit of the HLT optimization with benchmarks specifics for that > single one liner, but until then, the idea that we can drop the > retpoline optimization from the HLT vmexit by just thinking about it, > still doesn't make sense to me, because by thinking about it I come to > the opposite conclusion. > > The lack of single_task_running() in the guest driver is also why the > guest cpuidle haltpoll risks to waste some CPU with host overcommit or > with the host loaded at full capacity and why we may not assume it to > be universally enabled. > > Thanks, > Andrea >