On Tue, 2021-11-30 at 00:53 +0200, Maxim Levitsky wrote: > On Mon, 2021-11-29 at 20:18 +0100, Paolo Bonzini wrote: > > On 11/29/21 19:55, Sean Christopherson wrote: > > > > Still it does seem to be a race that happens when IS_RUNNING=true but > > > > vcpu->mode == OUTSIDE_GUEST_MODE. This patch makes the race easier to > > > > trigger because it moves IS_RUNNING=false later. > > > > > > Oh! Any chance the bug only repros with preemption enabled? That would explain > > > why I don't see problems, I'm pretty sure I've only run AVIC with a PREEMPT=n. > > > > Me too. > > > > > svm_vcpu_{un}blocking() are called with preemption enabled, and avic_set_running() > > > passes in vcpu->cpu. If the vCPU is preempted and scheduled in on a different CPU, > > > avic_vcpu_load() will overwrite the vCPU's entry with the wrong CPU info. > > > > That would make a lot of sense. avic_vcpu_load() can handle > > svm->avic_is_running = false, but avic_set_running still needs its body > > wrapped by preempt_disable/preempt_enable. > > > > Fedora's kernel is CONFIG_PREEMPT_VOLUNTARY, but I know Maxim uses his > > own build so it would not surprise me if he used CONFIG_PREEMPT=y. > > > > Paolo > > > > I will write ll the details tomorrow but I strongly suspect the CPU errata > https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf > #1235 > > Basically what I see that > > 1. vCPU2 disables is_running in avic physical id cache > 2. vCPU2 checks that IRR is empty and it is > 3. vCPU2 does schedule(); > > and it keeps on sleeping forever. If I kick it via signal > (like just doing 'info registers' qemu hmp command > or just stop/cont on the same hmp interface, the > vCPU wakes up and notices that IRR suddenly is not empty, > and the VM comes back to life (and then hangs after a while again > with the same problem....). > > As far as I see in the traces, the bit in IRR came from > another VCPU who didn't respect the ir_running bit and didn't get > AVIC_INCOMPLETE_IPI VMexit. > I can't 100% prove it yet, but everything in the trace shows this. > > About the rest of the environment, currently I reproduce this in > a VM which has no pci passed through devices at all, just AVIC. > (I wasn't able to reproduce it before just because I forgot to > enable AVIC in this configuration). > > So I also agree that Sean's patch is not to blame here, > it just made the window between setting is_running and getting to sleep > shorter and made it less likely that other vCPUs will pick up the is_running change. > (I suspect that they pick it up on next vmrun, and otherwise the value is somehow > cached wrongfully in them). > > A very performance killing workaround of kicking all vCPUs when one of them enters vcpu_block > does seem to work for me but it skews all the timing off so I can't prove it. > > That is all, I will write more detailed info, including some traces I have. > > I do use windows 10 with so called LatencyMon in it, which shows overall how > much latency hardware interrupts have, which used to be useful for me to > ensure that my VMs are suitable for RT like latency (even before I joined RedHat, > I tuned my VMs as much as I could to make my Rift CV1 VR headset work well which > needs RT like latencies. > > These days VR works fine in my VMs anyway, but I still kept this tool to keep an eye on it). > > I really need to write a kvm unit test to stress test IPIs, especially this case, > I will do this very soon. > > > Wei Huang, any info on this would be very helpful. > > Maybe putting the avic physical table in UC memory would help? > Maybe ringing doorbells of all other vcpus will help them notice the change? > > Best regards, > Maxim Levitsky Hi! I am now almost sure that this is errata #1235. I had attached a kvm-unit-test I wrote (patch against master of https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git/) which is able to reproduce the issue on stock 5.15.0 kernel (*no patches applied at all*) after just few seconds. If kvm is loaded without halt-polling (that is halt_poll_ns=0 is used). Halt polling and/or Sean's patch are not to blame, it just changes timeing. With Sean's patch I don't need to disable half polling. I did find few avic inhibition bugs that this test also finds and to make it work before I fix them, I added a workaround to not hit them in this test. I'll send patches to fix those very soon. Note that in windows VM there were no avic inhibitions so those bugs are not relevant. Wei Huang, do you know if this issue is fixed on Zen3, and if it is fixed on some Zen2 machines? Any workarounds other than 'don't use avic'? Best regards, Maxim Levitsky
From dba295579d10ff88b596697c861a7c83f5e9d013 Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Date: Tue, 30 Nov 2021 13:56:57 +0200 Subject: [PATCH] add unit test for avic ipi Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> --- x86/Makefile.common | 3 +- x86/ipi_stress.c | 169 ++++++++++++++++++++++++++++++++++++++++++++ x86/unittests.cfg | 5 ++ 3 files changed, 176 insertions(+), 1 deletion(-) create mode 100644 x86/ipi_stress.c diff --git a/x86/Makefile.common b/x86/Makefile.common index 461de51..372d6ec 100644 --- a/x86/Makefile.common +++ b/x86/Makefile.common @@ -60,7 +60,8 @@ tests-common = $(TEST_DIR)/vmexit.flat $(TEST_DIR)/tsc.flat \ $(TEST_DIR)/init.flat $(TEST_DIR)/smap.flat \ $(TEST_DIR)/hyperv_synic.flat $(TEST_DIR)/hyperv_stimer.flat \ $(TEST_DIR)/hyperv_connections.flat \ - $(TEST_DIR)/umip.flat $(TEST_DIR)/tsx-ctrl.flat + $(TEST_DIR)/umip.flat $(TEST_DIR)/tsx-ctrl.flat \ + $(TEST_DIR)/ipi_stress.flat test_cases: $(tests-common) $(tests) diff --git a/x86/ipi_stress.c b/x86/ipi_stress.c new file mode 100644 index 0000000..50cbf39 --- /dev/null +++ b/x86/ipi_stress.c @@ -0,0 +1,169 @@ +#include "libcflat.h" +#include "smp.h" +#include "alloc.h" +#include "apic.h" +#include "processor.h" +#include "isr.h" +#include "asm/barrier.h" +#include "delay.h" +#include "atomic.h" + +static atomic_t running_cpus; + +volatile u64 *isr_counts; + +volatile bool finish_init; + +u64 num_outer_iterations = 5; +#define NUM_INNER_ITERATIONS 10000 +#define TOTAL_NUMBER_ITERATIONS (num_outer_iterations * NUM_INNER_ITERATIONS) + +static void ipi_interrupt_handler(isr_regs_t *r) +{ + isr_counts[smp_id()]++; + eoi(); +} + +static void wait_for_ipi(volatile u64 *count) +{ + u64 old_count = *count; + while (1) { + if ((rdtsc() >> 4) % 10000 != 0 ) { + asm volatile( + /* short window for interrupts */ + "sti\n" + "nop\n" + "cli\n" + ); + if (old_count != *count) + break; + } + + else { + asm volatile( + /* short window for interrupts */ + "sti\n" + "hlt\n" + "cli\n" + ); + + if (old_count != *count) + break; + } + }; +} + +static void vcpu_init(void *data) +{ + int cpu = (long)data; + + /* To make it easier to see iteration number in the trace */ + handle_irq(0x40, ipi_interrupt_handler); + handle_irq(0x50, ipi_interrupt_handler); + + atomic_inc(&running_cpus); + + if (cpu != 0) + while (!finish_init); +} + +static void vcpu_code(void *data) +{ + int ncpus = cpu_count(); + int cpu = (long)data; + u64 i; + + assert(cpu != 0); + if (cpu != 1) + wait_for_ipi(&isr_counts[cpu]); + + for (i = 0; i < TOTAL_NUMBER_ITERATIONS; i++) { + + u8 physical_dst = cpu == ncpus -1 ? 1 : cpu + 1; + + // send IPI to a next vCPU in a circular fashion + apic_icr_write(APIC_INT_ASSERT | + APIC_DEST_PHYSICAL | + APIC_DM_FIXED | + (i % 2 ? 0x40 : 0x50), + physical_dst); + + // wait for the IPI interrupt chain to come back to us + if (i < (TOTAL_NUMBER_ITERATIONS - 1) || cpu == 1) + wait_for_ipi(&isr_counts[cpu]); + + if (cpu == 1 && (i % NUM_INNER_ITERATIONS == 0)) + printf("."); + } +} + + + +int main(int argc, void** argv) +{ + int cpu, ncpus = cpu_count(); + + assert(ncpus > 2); + + if (argc > 1) + num_outer_iterations = atol(argv[1]); + + isr_counts = (volatile u64 *)calloc(ncpus, sizeof(u64)); + + printf("found %d cpus\n", ncpus); + printf("running for %lld iterations\n", + (long long unsigned int)TOTAL_NUMBER_ITERATIONS); + + /* + * Ensure that we don't have interrupt window pending + * from PIT timer which inhibits the AVIC. + */ + + for (cpu = 0; cpu < ncpus; ++cpu) + on_cpu_async(cpu, vcpu_init, (void *)(long)cpu); + + /* Workaround for apic->irr_pending bug vs avic inhibittion: + * Ensure that all vCPUs are running before uninhibiting the AVIC + * */ + + while (atomic_read(&running_cpus) < ncpus) + pause(); + + asm volatile("sti;nop;cli\n"); + + /* now let all the vCPUs end the IPI function*/ + finish_init = true; + + while (cpus_active() > 1) + pause(); + + printf("starting test on all cpus but 0...\n"); + + for (cpu = ncpus-1; cpu > 0; cpu--) + on_cpu_async(cpu, vcpu_code, (void *)(long)cpu); + + printf("test started, waiting to end...\n"); + + while (cpus_active() > 1) { + + u64 isr_count1 = isr_counts[1]; + + delay(5ULL*1000*1000*1000); + + if (isr_count1 == isr_counts[1]) { + printf("\n"); + printf("hang detected!!\n"); + break; + } + } + + printf("\n"); + + for (cpu = 1; cpu < ncpus; ++cpu) + report(isr_counts[cpu] == TOTAL_NUMBER_ITERATIONS, + "Number of IPIs match (%lld)", + (long long unsigned int)isr_counts[cpu]); + + free((void*)isr_counts); + return report_summary(); +} diff --git a/x86/unittests.cfg b/x86/unittests.cfg index 6244595..ff866b4 100644 --- a/x86/unittests.cfg +++ b/x86/unittests.cfg @@ -61,6 +61,11 @@ smp = 2 file = smptest.flat smp = 3 +[ipi_stress] +file = ipi_stress.flat +extra_params = -cpu host,-x2apic,-svm,-hypervisor -global kvm-pit.lost_tick_policy=discard -machine kernel-irqchip=on +smp = 4 + [vmexit_cpuid] file = vmexit.flat extra_params = -append 'cpuid' -- 2.26.3
_______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm