KVM: x86/vPMU/AMD: Can we detect PMU is off for a VM?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All,

as a followup for my patch: i have noticed that
currently Intel kernel code provides an ability to detect if PMU is totally disabled for a VM
(pmu->version == 0 in this case), but for AMD code pmu->version is never 0,
no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> in the VM config which results in "-cpu pmu=off" qemu option).

So the question is - is it possible to enhance the code for AMD to also honor PMU VM setting or it is impossible by design?

i can try implementing this, but need a direction.

Thank you in advance.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 09.11.2023 19:06, Konstantin Khorenko wrote:
We have detected significant performance drop of our atomic test which
checks the rate of CPUID instructions rate inside an L1 VM on an AMD
node.

Investigation led to 2 mainstream patches which have introduced extra
events accounting:

    018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions")
    9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")

And on an AMD Zen 3 CPU that resulted in immediate 43% drop in the CPUID
rate.

Checking latest mainsteam kernel the performance difference is much less
but still quite noticeable: 13.4% and shows up on AMD CPUs only.

Looks like iteration over all PMCs in kvm_pmu_trigger_event() is cheap
on Intel and expensive on AMD CPUs.

So the idea behind this patch is to skip iterations over PMCs at all in
case PMU is disabled for a VM completely or PMU is enabled for a VM, but
there are no active PMCs at all.

Unfortunately
  * current kernel code does not differentiate if PMU is globally enabled
    for a VM or not (pmu->version is always 1)
  * AMD CPUs older than Zen 4 do not support PMU v2 and thus efficient
    check for enabled PMCs is not possible

=> the patch speeds up vmexit for AMD Zen 4 CPUs only, this is sad.
    but the patch does not hurt other CPUs - and this is fortunate!

i have no access to a node with AMD Zen 4 CPU, so i had to test on
AMD Zen 3 CPU and i hope my expectations are right for AMD Zen 4.

i would appreciate if anyone perform the test of a real AMD Zen 4 node.

AMD performance results:
CPU: AMD Zen 3 (three!): AMD EPYC 7443P 24-Core Processor

  * The test binary is run inside an AlmaLinux 9 VM with their stock kernel
    5.14.0-284.11.1.el9_2.x86_64.
  * Test binary checks the CPUID instractions rate (instructions per sec).
  * Default VM config (PMU is off, pmu->version is reported as 1).
  * The Host runs the kernel under test.

  # for i in 1 2 3 4 5 ; do ./at_cpu_cpuid.pub ; done | \
    awk -e '{print $4;}' | \
    cut -f1 --delimiter='.' | \
    ./avg.sh

Measurements:
1. Host runs stock latest mainstream kernel commit 305230142ae0.
2. Host runs same mainstream kernel + current patch.
3. Host runs same mainstream kernel + current patch + force
    guest_pmu_is_enabled() to always return "false" using following change:

    -       if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))
    +       if (pmu->version == 1 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask))

    -----------------------------------------
    | Kernels       | CPUID rate            |
    -----------------------------------------
    | 1.            | 1360250               |
    | 2.            | 1365536 (+ 0.4%)      |
    | 3.            | 1541850 (+13.4%)      |
    -----------------------------------------

Measurement (2) gives some fluctuation, the performance is not increased
because the test was done on a Zen 3 CPU, so we are unable to use fast
check for active PMCs.
Measurement (3) shows expected performance boost on a Zen 4 CPU under
the same test.

The test used:
# cat at_cpu_cpuid.pub.cpp
/*
  * The test executes CPUID instruction in a loop and reports the calls rate.
  */

#include <stdio.h>
#include <time.h>

/* #define CPUID_EAX            0x80000002 */
#define CPUID_EAX               0x29a
#define CPUID_ECX               0

#define TEST_EXEC_SECS          30      // in seconds
#define LOOPS_APPROX_RATE       1000000

static inline void cpuid(unsigned int _eax, unsigned int _ecx)
{
         unsigned int regs[4] = {_eax, 0, _ecx, 0};

         asm __volatile__(
                 "cpuid"
                 : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
                 :  "0" (regs[0]),  "1" (regs[1]),  "2" (regs[2]),  "3" (regs[3])
                 : "memory");
}

double cpuid_rate_loops(int loops_num)
{
         int i;
         clock_t start_time, end_time;
         double spent_time, rate;

         start_time = clock();

         for (i = 0; i < loops_num; i++)
                 cpuid((unsigned int)CPUID_EAX, (unsigned int)CPUID_ECX);

         end_time = clock();
         spent_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;

         rate = (double)loops_num / spent_time;

         return rate;
}

int main(int argc, char* argv[])
{
         double approx_rate, rate;
         int loops;

         /* First we detect approximate CPUIDs rate. */
         approx_rate = cpuid_rate_loops(LOOPS_APPROX_RATE);

         /*
          * How many loops there should be in order to run the test for
          * TEST_EXEC_SECS seconds?
          */
         loops = (int)(approx_rate * TEST_EXEC_SECS);

         /* Get the precise instructions rate. */
         rate = cpuid_rate_loops(loops);

         printf( "CPUID instructions rate: %f instructions/second\n", rate);

         return 0;
}

Konstantin Khorenko (1):
   KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC

  arch/x86/kvm/pmu.c | 26 ++++++++++++++++++++++++++
  1 file changed, 26 insertions(+)





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux