> From: David Laight <David.Laight@xxxxxxxxxx> > Sent: Wednesday, August 24, 2022 10:08 PM > To: Mi, Dapeng1 <dapeng1.mi@xxxxxxxxx>; rafael@xxxxxxxxxx; > daniel.lezcano@xxxxxxxxxx; pbonzini@xxxxxxxxxx > Cc: linux-pm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > kvm@xxxxxxxxxxxxxxx; zhenyuw@xxxxxxxxxxxxxxx > Subject: RE: [PATCH] KVM: x86: use TPAUSE to replace PAUSE in halt polling > > From: Dapeng Mi > > Sent: 24 August 2022 10:11 > > > > TPAUSE is a new instruction on Intel processors which can instruct > > processor enters a power/performance optimized state. Halt polling > > uses PAUSE instruction to wait vCPU is waked up. The polling time > > could be long and cause extra power consumption in some cases. > > > > Use TPAUSE to replace the PAUSE instruction in halt polling to get a > > better power saving and performance. > > What is the effect on wakeup latency? > Quite often that is far more important than a bit of power saving. In theory, the increased wakeup latency should be less than 1us. I thought this latency impaction should be minimal. I ever run two scheduling related benchmarks, hackbench and schbench. I didn't see this change would obviously impact the performance. When running these two scheduling benchmarks on host, a FIO workload is running in a Linux VM simultaneously, FIO would trigger a large number of HLT VM-exit and then trigger haltpolling, then we can see how TPAUSE can impact the performance. Here are the hackbench and schbench data on Intel ADL platform. Hackbench base TPAUSE %delta Group-1 0.056 0.052 7.14% Group-4 0.165 0.164 0.61% Group-8 0.313 0.309 1.28% Group-16 0.834 0.842 -0.96% Schbench - Latency percentiles (usec) base TPAUSE ./schbench -m 1 50.0th 15 13 99.0th 221 203 ./schbench -m 2 50.0th 26 23 99.0th 16368 16544 ./schbench -m 4 50.0th 56 60 99.0th 33984 34112 Since the schbench benchmark is not so stable, but I can see the data is on a same level. > The automatic entry of sleep states is a PITA already. > Block 30 RT threads in cv_wait() and then do cv_broadcast(). > Use ftrace to see just how long it takes the last thread to wake up. I think this test is familiar with the hackbench and schbench, it should have similar result. Anyway, performance and power is a tradeoff, it depends on which side we think is more important. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 > 1PT, UK Registration No: 1397386 (Wales)