Hi Alexandru, On 11/25/20 4:51 PM, Alexandru Elisei wrote: > The LPI code validates a result similarly to the IPI tests, by checking if > the target CPU received the interrupt with the expected interrupt number. > However, the LPI tests invent their own way of checking the test results by > creating a global struct (lpi_stats), using a separate interrupt handler > (lpi_handler) and test function (check_lpi_stats). > > There are several areas that can be improved in the LPI code, which are > already covered by the IPI tests: > > - check_lpi_stats() doesn't take into account that the target CPU can > receive the correct interrupt multiple times. > - check_lpi_stats() doesn't take into the account the scenarios where all > online CPUs can receive the interrupt, but the target CPU is the last CPU > that touches lpi_stats.observed. > - Insufficient or missing memory synchronization. > > Instead of duplicating code, let's convert the LPI tests to use > check_acked() and the same interrupt handler as the IPI tests, which has > been renamed to irq_handler() to avoid any confusion. > > check_lpi_stats() has been replaced with check_acked() which, together with > using irq_handler(), instantly gives us more correctness checks and proper > memory synchronization between threads. lpi_stats.expected has been > replaced by the CPU mask and the expected interrupt number arguments to > check_acked(), with no change in semantics. > > lpi_handler() aborted the test if the interrupt number was not an LPI. This > was changed in favor of allowing the test to continue, as it will fail in > check_acked(), but possibly print information useful for debugging. If the > test receives spurious interrupts, those are reported via report_info() at > the end of the test for consistency with the IPI tests, which don't treat > spurious interrupts as critical errors. > > In the spirit of code reuse, secondary_lpi_tests() has been replaced with > ipi_recv() because the two are now identical; ipi_recv() has been renamed > to irq_recv(), similarly to irq_handler(), to avoid confusion. > > CC: Eric Auger <eric.auger@xxxxxxxxxx> > Signed-off-by: Alexandru Elisei <alexandru.elisei@xxxxxxx> > --- > With this change, I get the following failure for its-trigger on a > rockpro64 (running on the little cores): > > $ taskset -c 0-3 arm/run arm/gic.flat -smp 4 -machine gic-version=3 -append its-trigger > /usr/bin/qemu-system-aarch64 -nodefaults -machine virt,gic-version=host,accel=kvm -cpu host -device virtio-serial-device -device virtconsole,chardev=ctd -chardev testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel arm/gic.flat -smp 4 -machine gic-version=3 -append its-trigger # -initrd /tmp/tmp.wWW0iJY6DS > ITS: MAPD devid=2 size = 0x8 itt=0x403a0000 valid=1 > ITS: MAPD devid=7 size = 0x8 itt=0x403b0000 valid=1 > MAPC col_id=3 target_addr = 0x30000 valid=1 > MAPC col_id=2 target_addr = 0x20000 valid=1 > INVALL col_id=2 > INVALL col_id=3 > MAPTI dev_id=2 event_id=20 -> phys_id=8195, col_id=3 > MAPTI dev_id=7 event_id=255 -> phys_id=8196, col_id=2 > INT dev_id=2 event_id=20 > PASS: gicv3: its-trigger: int: dev=2, eventid=20 -> lpi= 8195, col=3 > INT dev_id=7 event_id=255 > PASS: gicv3: its-trigger: int: dev=7, eventid=255 -> lpi= 8196, col=2 > INV dev_id=2 event_id=20 > INT dev_id=2 event_id=20 > PASS: gicv3: its-trigger: inv/invall: dev2/eventid=20 does not trigger any LPI > INT dev_id=2 event_id=20 > PASS: gicv3: its-trigger: inv/invall: dev2/eventid=20 still does not trigger any LPI > INVALL col_id=3 > INT dev_id=2 event_id=20 > INFO: gicv3: its-trigger: inv/invall: ACKS: missing=0 extra=1 unexpected=0 > FAIL: gicv3: its-trigger: inv/invall: dev2/eventid=20 now triggers an LPI > ITS: MAPD devid=2 size = 0x8 itt=0x403a0000 valid=0 > INT dev_id=2 event_id=20 > PASS: gicv3: its-trigger: mapd valid=false: no LPI after device unmap > SUMMARY: 6 tests, 1 unexpected failures > > The reason for the failure is that the test "dev2/eventid=20 now triggers > an LPI" triggers 2 LPIs, not one. This behavior was present before this > patch, but it was ignored because check_lpi_stats() wasn't looking at the > acked array. > > I'm not familiar with the ITS so I'm not sure if this is expected, if the > test is incorrect or if there is something wrong with KVM emulation. > > Did some more testing on an Ampere eMAG (fast out-of-order cores) using > qemu and kvmtool and Linux v5.8, here's what I found: > > - Using qemu and gic.flat built from *master*: error encountered 864 times > out of 1088 runs. > - Using qemu: error encountered 852 times out of 1027 runs. > - Using kvmtool: error encountered 8164 times out of 10602 runs. > > Looks to me like it's consistent between master and this series, and > between qemu and kvmtool. > > Here's the diff that I used for testing master (I removed the diff line > because it causes trouble when applying the main patch): > > @@ -772,8 +772,12 @@ static void test_its_trigger(void) > /* Now call the invall and check the LPI hits */ > its_send_invall(col3); > lpi_stats_expect(3, 8195); > + acked[3] = 0; > + dsb(ishst); > its_send_int(dev2, 20); > check_lpi_stats("dev2/eventid=20 now triggers an LPI"); > + report_info("acked[3] = %d", acked[3]); > + report(acked[3] == 1, "dev2/eventid=20 received one interrupt"); > > report_prefix_pop(); > > > arm/gic.c | 185 ++++++++++++++++++++++++++---------------------------- > 1 file changed, 88 insertions(+), 97 deletions(-) > > diff --git a/arm/gic.c b/arm/gic.c > index da7b42da5449..6e93da80fe0d 100644 > --- a/arm/gic.c > +++ b/arm/gic.c > @@ -111,7 +111,7 @@ static bool check_acked(cpumask_t *mask, int sender, int irqnum) > } > if (!acked[cpu]) > continue; > - smp_rmb(); /* pairs with smp_wmb in ipi_handler */ > + smp_rmb(); /* pairs with smp_wmb in irq_handler */ > > if (has_gicv2 && irq_sender[cpu] != sender) { > report_info("cpu%d received IPI from wrong sender %d", > @@ -149,11 +149,12 @@ static void check_spurious(void) > static int gic_get_sender(int irqstat) > { > if (gic_version() == 2) > + /* GICC_IAR.CPUID is RAZ for non-SGIs */ > return (irqstat >> 10) & 7; > return -1; > } > > -static void ipi_handler(struct pt_regs *regs __unused) > +static void irq_handler(struct pt_regs *regs __unused) > { > u32 irqstat = gic_read_iar(); > u32 irqnr = gic_iar_irqnr(irqstat); > @@ -192,75 +193,6 @@ static void setup_irq(irq_handler_fn handler) > } > > #if defined(__aarch64__) > -struct its_event { > - int cpu_id; > - int lpi_id; > -}; > - > -struct its_stats { > - struct its_event expected; > - struct its_event observed; > -}; > - > -static struct its_stats lpi_stats; > - > -static void lpi_handler(struct pt_regs *regs __unused) > -{ > - u32 irqstat = gic_read_iar(); > - int irqnr = gic_iar_irqnr(irqstat); > - > - gic_write_eoir(irqstat); > - assert(irqnr >= 8192); > - smp_rmb(); /* pairs with wmb in lpi_stats_expect */ > - lpi_stats.observed.cpu_id = smp_processor_id(); > - lpi_stats.observed.lpi_id = irqnr; > - acked[lpi_stats.observed.cpu_id]++; > - smp_wmb(); /* pairs with rmb in check_lpi_stats */ > -} > - > -static void lpi_stats_expect(int exp_cpu_id, int exp_lpi_id) > -{ > - lpi_stats.expected.cpu_id = exp_cpu_id; > - lpi_stats.expected.lpi_id = exp_lpi_id; > - lpi_stats.observed.cpu_id = -1; > - lpi_stats.observed.lpi_id = -1; > - smp_wmb(); /* pairs with rmb in handler */ > -} > - > -static void check_lpi_stats(const char *msg) > -{ > - int i; > - > - for (i = 0; i < 50; i++) { > - mdelay(100); > - smp_rmb(); /* pairs with wmb in lpi_handler */ > - if (lpi_stats.observed.cpu_id == lpi_stats.expected.cpu_id && > - lpi_stats.observed.lpi_id == lpi_stats.expected.lpi_id) { > - report(true, "%s", msg); > - return; > - } > - } > - > - if (lpi_stats.observed.cpu_id == -1 && lpi_stats.observed.lpi_id == -1) { > - report_info("No LPI received whereas (cpuid=%d, intid=%d) " > - "was expected", lpi_stats.expected.cpu_id, > - lpi_stats.expected.lpi_id); > - } else { > - report_info("Unexpected LPI (cpuid=%d, intid=%d)", > - lpi_stats.observed.cpu_id, > - lpi_stats.observed.lpi_id); > - } > - report(false, "%s", msg); > -} > - > -static void secondary_lpi_test(void) > -{ > - setup_irq(lpi_handler); > - cpumask_set_cpu(smp_processor_id(), &ready); > - while (1) > - wfi(); > -} > - > static void check_lpi_hits(int *expected, const char *msg) > { > bool pass = true; > @@ -347,7 +279,7 @@ static void ipi_test_smp(void) > > static void ipi_send(void) > { > - setup_irq(ipi_handler); > + setup_irq(irq_handler); > wait_on_ready(); > ipi_test_self(); > ipi_test_smp(); > @@ -355,9 +287,9 @@ static void ipi_send(void) > exit(report_summary()); > } > > -static void ipi_recv(void) > +static void irq_recv(void) > { > - setup_irq(ipi_handler); > + setup_irq(irq_handler); > cpumask_set_cpu(smp_processor_id(), &ready); > while (1) > wfi(); > @@ -368,7 +300,7 @@ static void ipi_test(void *data __unused) > if (smp_processor_id() == IPI_SENDER) > ipi_send(); > else > - ipi_recv(); > + irq_recv(); > } > > static struct gic gicv2 = { > @@ -698,12 +630,12 @@ static int its_prerequisites(int nb_cpus) > > stats_reset(); > > - setup_irq(lpi_handler); > + setup_irq(irq_handler); > > for_each_present_cpu(cpu) { > if (cpu == 0) > continue; > - smp_boot_secondary(cpu, secondary_lpi_test); > + smp_boot_secondary(cpu, irq_recv); > } > wait_on_ready(); > > @@ -757,6 +689,7 @@ static void test_its_trigger(void) > { > struct its_collection *col3; > struct its_device *dev2, *dev7; > + cpumask_t mask; > > if (its_setup1()) > return; > @@ -767,13 +700,27 @@ static void test_its_trigger(void) > > report_prefix_push("int"); > > - lpi_stats_expect(3, 8195); > + stats_reset(); > + /* > + * its_send_int() is missing the synchronization from the GICv3 IPI > + * trigger functions. > + */ > + wmb(); so don't you want to add it in __its_send_int instead? Eric > + cpumask_clear(&mask); > + cpumask_set_cpu(3, &mask); > its_send_int(dev2, 20); > - check_lpi_stats("dev=2, eventid=20 -> lpi= 8195, col=3"); > + wait_for_interrupts(&mask); > + report(check_acked(&mask, 0, 8195), > + "dev=2, eventid=20 -> lpi= 8195, col=3"); > > - lpi_stats_expect(2, 8196); > + stats_reset(); > + wmb(); > + cpumask_clear(&mask); > + cpumask_set_cpu(2, &mask); > its_send_int(dev7, 255); > - check_lpi_stats("dev=7, eventid=255 -> lpi= 8196, col=2"); > + wait_for_interrupts(&mask); > + report(check_acked(&mask, 0, 8196), > + "dev=7, eventid=255 -> lpi= 8196, col=2"); > > report_prefix_pop(); > > @@ -786,9 +733,13 @@ static void test_its_trigger(void) > gicv3_lpi_set_config(8195, LPI_PROP_DEFAULT & ~LPI_PROP_ENABLED); > its_send_inv(dev2, 20); > > - lpi_stats_expect(-1, -1); > + stats_reset(); > + wmb(); > + cpumask_clear(&mask); > its_send_int(dev2, 20); > - check_lpi_stats("dev2/eventid=20 does not trigger any LPI"); > + wait_for_interrupts(&mask); > + report(check_acked(&mask, -1, -1), > + "dev2/eventid=20 does not trigger any LPI"); > > /* > * re-enable the LPI but willingly do not call invall > @@ -796,15 +747,24 @@ static void test_its_trigger(void) > * The LPI should not hit > */ > gicv3_lpi_set_config(8195, LPI_PROP_DEFAULT); > - lpi_stats_expect(-1, -1); > + stats_reset(); > + wmb(); > + cpumask_clear(&mask); > its_send_int(dev2, 20); > - check_lpi_stats("dev2/eventid=20 still does not trigger any LPI"); > + wait_for_interrupts(&mask); > + report(check_acked(&mask, -1, -1), > + "dev2/eventid=20 still does not trigger any LPI"); > > /* Now call the invall and check the LPI hits */ > its_send_invall(col3); > - lpi_stats_expect(3, 8195); > + stats_reset(); > + wmb(); > + cpumask_clear(&mask); > + cpumask_set_cpu(3, &mask); > its_send_int(dev2, 20); > - check_lpi_stats("dev2/eventid=20 now triggers an LPI"); > + wait_for_interrupts(&mask); > + report(check_acked(&mask, 0, 8195), > + "dev2/eventid=20 now triggers an LPI"); > > report_prefix_pop(); > > @@ -815,9 +775,14 @@ static void test_its_trigger(void) > */ > > its_send_mapd(dev2, false); > - lpi_stats_expect(-1, -1); > + stats_reset(); > + wmb(); > + cpumask_clear(&mask); > its_send_int(dev2, 20); > - check_lpi_stats("no LPI after device unmap"); > + wait_for_interrupts(&mask); > + report(check_acked(&mask, -1, -1), "no LPI after device unmap"); > + > + check_spurious(); > report_prefix_pop(); > } > > @@ -825,6 +790,7 @@ static void test_its_migration(void) > { > struct its_device *dev2, *dev7; > bool test_skipped = false; > + cpumask_t mask; > > if (its_setup1()) { > test_skipped = true; > @@ -841,13 +807,25 @@ do_migrate: > if (test_skipped) > return; > > - lpi_stats_expect(3, 8195); > + stats_reset(); > + wmb(); > + cpumask_clear(&mask); > + cpumask_set_cpu(3, &mask); > its_send_int(dev2, 20); > - check_lpi_stats("dev2/eventid=20 triggers LPI 8195 on PE #3 after migration"); > + wait_for_interrupts(&mask); > + report(check_acked(&mask, 0, 8195), > + "dev2/eventid=20 triggers LPI 8195 on PE #3 after migration"); > > - lpi_stats_expect(2, 8196); > + stats_reset(); > + wmb(); > + cpumask_clear(&mask); > + cpumask_set_cpu(2, &mask); > its_send_int(dev7, 255); > - check_lpi_stats("dev7/eventid=255 triggers LPI 8196 on PE #2 after migration"); > + wait_for_interrupts(&mask); > + report(check_acked(&mask, 0, 8196), > + "dev7/eventid=255 triggers LPI 8196 on PE #2 after migration"); > + > + check_spurious(); > } > > #define ERRATA_UNMAPPED_COLLECTIONS "ERRATA_8c58be34494b" > @@ -857,6 +835,7 @@ static void test_migrate_unmapped_collection(void) > struct its_collection *col = NULL; > struct its_device *dev2 = NULL, *dev7 = NULL; > bool test_skipped = false; > + cpumask_t mask; > int pe0 = 0; > u8 config; > > @@ -891,17 +870,29 @@ do_migrate: > its_send_mapc(col, true); > its_send_invall(col); > > - lpi_stats_expect(2, 8196); > + stats_reset(); > + wmb(); > + cpumask_clear(&mask); > + cpumask_set_cpu(2, &mask); > its_send_int(dev7, 255); > - check_lpi_stats("dev7/eventid= 255 triggered LPI 8196 on PE #2"); > + wait_for_interrupts(&mask); > + report(check_acked(&mask, 0, 8196), > + "dev7/eventid= 255 triggered LPI 8196 on PE #2"); > > config = gicv3_lpi_get_config(8192); > report(config == LPI_PROP_DEFAULT, > "Config of LPI 8192 was properly migrated"); > > - lpi_stats_expect(pe0, 8192); > + stats_reset(); > + wmb(); > + cpumask_clear(&mask); > + cpumask_set_cpu(pe0, &mask); > its_send_int(dev2, 0); > - check_lpi_stats("dev2/eventid = 0 triggered LPI 8192 on PE0"); > + wait_for_interrupts(&mask); > + report(check_acked(&mask, 0, 8192), > + "dev2/eventid = 0 triggered LPI 8192 on PE0"); > + > + check_spurious(); > } > > static void test_its_pending_migration(void) >