On Thu, Nov 02, 2023, Parshuram Sangle wrote: > KVM halt polling interval growth and shrink behavior has evolved since its > inception. The current mechanism adjusts the polling interval based on whether > vcpu wakeup was received or not during polling interval using grow and shrink > parameter values. Though grow parameter is logically set to 2 by default, > shrink parameter is kept disabled (set to 0). > > Disabled shrink has two issues: > 1) Resets polling interval to 0 on every un-successful poll assuming it is > less likely to receive a vcpu wakeup in further shrunk intervals. > 2) Even on successful poll, if total block time is greater or equal to current > poll_ns value, polling interval is reset to 0 instead shrinking gradually. > > These aspects reduce the chances receiving valid wakeup during polling and > lose potential performance benefits for VM workloads. > > Below is the summary of experiments conducted to assess performance and power > impact by enabling the halt_poll_ns_shrink parameter(value set to 2). > > Performance Test Summary: (Higher is better) > -------------------------------------------- > Platform Details: Chrome Brya platform > CPU - Alder Lake (12th Gen Intel CPU i7-1255U) > Host kernel version - 5.15.127-20371-g710a1611ad33 > > Android VM workload (Score) Base Shrink Enabled (value 2) Delta > --------------------------------------------------------------------------- > GeekBench Multi-core(CPU) 5754 5856 2% > 3D Mark Slingshot(CPU+GPU) 15486 15885 3% > Stream (handopt)(Memory) 20566 21594 5% > fio seq-read (Storage) 727 747 3% > fio seq-write (Storage) 331 343 3% > fio rand-read (Storage) 690 732 6% > fio rand-write (Storage) 299 300 1% > > Steam Gaming VM (Avg FPS) Base Shrink Enabled (value 2) Delta > --------------------------------------------------------------------------- > Metro Redux (OpenGL) 54.80 59.60 9% > Dota 2 (Open GL) 48.74 51.40 5% > Dota 2 (Vulkan) 20.80 21.10 1% > SpaceShip (Vulkan) 20.40 21.52 6% > > With Shrink enabled, majority of workloads show higher % of successful polling. > Reduced latency of returning control back to VM and avoided overhead of vm_exit > contribute to these performance gains. > > Power Impact Assessment Summary: (Lower is better) > -------------------------------------------------- > Method : DAQ measurements of CPU and Memory rails > > CPU+Memory (Watt) Base Shrink Enabled (value 2) Delta > --------------------------------------------------------------------------- > Idle* (Host) 0.636 0.631 -0.8% > Video Playback (Host) 2.225 2.210 -0.7% > Tomb Raider (VM) 17.261 17.175 -0.5% > SpaceShip Benchmark(VM) 17.079 17.123 0.3% > > *Idle power - Idle system with no application running, Android and Borealis > VMs enabled running no workload. Duration 180 sec. > > Power measurements done for Chrome idle scenario and active Gaming VM > workload show negligible power overhead since additional polling creates > very short duration bursts which are less likely to have gone to a > complete idle CPU state. > > NOTE: No tests are conducted on non-x86 platform with this changed config > > The default values of grow and shrink parameters get commonly used by > various VM deployments unless specifically tuned for performance. Hence > referring to performance and power measurements results shown above, it is > recommended to have shrink enabled (with value 2) by default so that there > is no need to explicitly set this parameter through kernel cmdline or by > other means. I am by no means an expert on halt polling or power management, but all of this seems like a reasonable tradeoff. And even without the numbers you provided, starting from scratch after a single failure is rather odd. So unless someone objects, I'll plan on applying this for 6.11 in a few weeks (after the 6.10 merge window closes).