Optimized clocksource with AMD AVIC enabled for Windows guest

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[resent for the previous non-plain text format]
Hi KVM & AMD folks,
 
We are trying to enable AVIC on Windows guest and AMD host machine, on upstream kernel 5.8+. From our experiments and vmexit metrics, we can see AVIC brings us huge benefits over decreased by >80% interrupt vmexit, and totally avoid vintr and write_cr8 vmexits. But it seems for Windows guest, we have to give up the Hyper-v PV feature on the stimer (hv-stimer feature). So in order to get the best of both the worlds, do we have a more optimized clocksource for Windows guest which could co-exist with AVIC enabled (as now stimer cannot cowork AVIC) ?

Some detailed performance analysis below -
 
>From the kvm kernel func kvm_hv_activate_synic in https://elixir.bootlin.com/linux/v5.8/source/arch/x86/kvm/hyperv.c#L891, SynIC enabling would prevent apicv (for AMD it's AVIC), whereas SynIC is the pre-requisite of stimer. From the actual experiments, without hyper-v stimer, there are a lot of port IO vmexits which potential bring perf down cpu-bound workloads, like geekbench, around 10% of single core performance regressing. As the vmexits result when we enable AVIC but having the hypervclock and rtc as clocksource, without stimer+synic.
 ------------------------------------------------------------------------------------------------------------
Analyze events for all VMs, all VCPUs:
             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
                  io     575088    43.42%     1.96%      0.68us    100.62us      7.47us ( +-   0.13% )
                 msr     434530    32.81%     0.29%      0.41us    350.50us      1.45us ( +-   0.30% )
                 hlt     308635    23.30%    97.75%      0.43us   3791.74us    693.91us ( +-   0.12% )
           interrupt       4796     0.36%     0.00%      0.33us   1606.17us      1.89us ( +-  18.69% )
           write_cr4        752     0.06%     0.00%      0.53us     34.80us      1.42us ( +-   3.97% )
            read_cr4        376     0.03%     0.00%      0.40us      1.32us      0.62us ( +-   1.22% )
                 npf         85     0.01%     0.00%      1.68us     57.95us      8.33us ( +-  12.54% )
               pause         71     0.01%     0.00%      0.36us      1.44us      0.62us ( +-   3.45% )
               cpuid         50     0.00%     0.00%      0.33us      1.11us      0.45us ( +-   5.94% )
           hypercall         10     0.00%     0.00%      0.81us      1.42us      1.12us ( +-   5.87% )
                 nmi          1     0.00%     0.00%      0.67us      0.67us      0.67us ( +-   0.00% )
Total Samples:1324394, Total events handled time:219105470.74us.
-----------------------------------------------------------------------------------------------------------
It shows dramatically high IO vmexits, and we can further see which IO ports Windows guest accessed.
-----------------------------------------------------
Analyze events for all VMs, all VCPUs:
 
      IO Port Access    Samples  Samples%     Time%    Min Time    Max Time         Avg time
 
           0x70:POUT     287544    50.00%    13.10%      0.40us     23.48us      0.53us ( +-   0.06% )
            0x71:PIN     226154    39.33%     7.60%      0.31us     22.91us      0.39us ( +-   0.08% )
           0x71:POUT      61390    10.67%    79.31%     12.92us     69.99us     14.95us ( +-   0.09% )
 
Total Samples:575088, Total events handled time:1156983.53us.
---------------------------------------------
However 0070-0071 are rtc0 port, which means there are horrible guest RTC access overhead. With stimer + synic on and AVIC disabled, the vmexit metrics look much better over IO and MSR, as below.
-----------------------------------------
Analyze events for all VMs, all VCPUs:
             VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time         Avg time
                 hlt     166815    38.30%    99.66%      0.44us   1556.67us    809.48us ( +-   0.11% )
           interrupt     146218    33.57%     0.13%      0.30us   1362.10us      1.19us ( +-   1.50% )
                 msr     105267    24.17%     0.20%      0.37us     87.47us      2.51us ( +-   0.31% )
               vintr       9285     2.13%     0.01%      0.50us      1.92us      0.78us ( +-   0.16% )
           write_cr8       7537     1.73%     0.00%      0.31us     49.14us      0.66us ( +-   1.08% )
               cpuid        174     0.04%     0.00%      0.31us      1.39us      0.46us ( +-   3.21% )
                 npf        143     0.03%     0.00%      1.49us    237.66us     21.04us ( +-  12.04% )
           write_cr4         32     0.01%     0.00%      0.93us      5.78us      2.10us ( +-  11.38% )
               pause         22     0.01%     0.00%      0.45us      1.33us      0.84us ( +-   5.46% )
            read_cr4         16     0.00%     0.00%      0.47us      0.68us      0.60us ( +-   2.19% )
                 nmi         11     0.00%     0.00%      0.35us      0.70us      0.54us ( +-   5.06% )
           write_dr7          2     0.00%     0.00%      0.43us      0.45us      0.44us ( +-   2.27% )
           hypercall          1     0.00%     0.00%      0.97us      0.97us      0.97us ( +-   0.00% )
Total Samples:435523, Total events handled time:135488497.29us.
---------------------------------
>From the above observations, trying to see if there's a way for enabling AVIC while also having the most optimized clock source for windows guest.
 
Really appreciated and looking forward to your response.

Best Regards,
Kechen






[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux