On Tue, Jan 31, 2023 at 09:52:56AM +0000, Alexandru Elisei wrote: > Hi Drew, > > On Tue, Jan 31, 2023 at 07:56:23AM +0100, Andrew Jones wrote: > > On Fri, Jan 27, 2023 at 05:59:16PM +0000, Alexandru Elisei wrote: > > > From: Nikita Venkatesh <Nikita.Venkatesh@xxxxxxx> > > > > > > The test uses the following method. > > > > > > The primary CPU brings up all the secondary CPUs, which are held in a wait > > > loop. Once the primary releases the CPUs, each of the secondary CPUs > > > proceed to issue CPU_OFF. > > > > > > The primary CPU then checks for the status of the individual CPU_OFF > > > request. There is a chance that some CPUs might return from the CPU_OFF > > > function call after the primary CPU has finished the scan. There is no > > > foolproof method to handle this, but the test tries its best to > > > eliminate these false positives by introducing an extra delay if all the > > > CPUs are reported offline after the initial scan. > > > > > > Signed-off-by: Nikita Venkatesh <Nikita.Venkatesh@xxxxxxx> > > > [ Alex E: Skip CPU_OFF test if CPU_ON failed, drop cpu_off_success in > > > favour of checking AFFINITY_INFO, commit message tweaking ] > > > Signed-off-by: Alexandru Elisei <alexandru.elisei@xxxxxxx> > > > --- > > > > > > Decided to drop Drew's Reviewed-by tag because the changes are not trivial > > > from the previous version. > > > > > > arm/psci.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++---- > > > 1 file changed, 75 insertions(+), 5 deletions(-) > > > > > > diff --git a/arm/psci.c b/arm/psci.c > > > index f7238f8e0bbd..7034d8ebe6e1 100644 > > > --- a/arm/psci.c > > > +++ b/arm/psci.c > > > @@ -72,8 +72,9 @@ static bool psci_affinity_info_off(void) > > > } > > > > > > static int cpu_on_ret[NR_CPUS]; > > > -static cpumask_t cpu_on_ready, cpu_on_done; > > > +static cpumask_t cpu_on_ready, cpu_on_done, cpu_off_done; > > > static volatile int cpu_on_start; > > > +static volatile int cpu_off_start; > > > > > > extern void secondary_entry(void); > > > static void cpu_on_do_wake_target(void) > > > @@ -171,9 +172,71 @@ static bool psci_cpu_on_test(void) > > > return !failed; > > > } > > > > > > -int main(void) > > > +static void cpu_off_secondary_entry(void *data) > > > +{ > > > + int cpu = smp_processor_id(); > > > + > > > + while (!cpu_off_start) > > > + cpu_relax(); > > > + cpumask_set_cpu(cpu, &cpu_off_done); > > > + cpu_psci_cpu_die(); > > > +} > > > + > > > +static bool psci_cpu_off_test(void) > > > +{ > > > + bool failed = false; > > > + int i, count, cpu; > > > + > > > + for_each_present_cpu(cpu) { > > > + if (cpu == 0) > > > + continue; > > > + on_cpu_async(cpu, cpu_off_secondary_entry, NULL); > > > + } > > > + > > > + cpumask_set_cpu(0, &cpu_off_done); > > > + > > > + cpu_off_start = 1; > > > + report_info("waiting for the CPUs to be offlined..."); > > > + while (!cpumask_full(&cpu_off_done)) > > > + cpu_relax(); > > > + > > > + /* Allow all the other CPUs to complete the operation */ > > > + for (i = 0; i < 100; i++) { > > > + mdelay(10); > > > + > > > + count = 0; > > > + for_each_present_cpu(cpu) { > > > + if (cpu == 0) > > > + continue; > > > + if (psci_affinity_info(cpus[cpu], 0) != PSCI_0_2_AFFINITY_LEVEL_OFF) > > > + count++; > > > + } > > > + if (count > 0) > > > + continue; > > > > This should be > > > > if (count == 0) > > break; > > > > otherwise we never leave the loop early. > > Duh, don't know what I was thinking. Thanks for noticing it. > > > > > > + } > > > + > > > + /* Try to catch CPUs that return from CPU_OFF. */ > > > + if (count == 0) > > > + mdelay(100); > > > + > > > + for_each_present_cpu(cpu) { > > > + if (cpu == 0) > > > + continue; > > > + if (cpu_idle(cpu)) { > > > + report_info("CPU%d failed to be offlined", cpu); > > > + if (psci_affinity_info(cpus[cpu], 0) == PSCI_0_2_AFFINITY_LEVEL_OFF) > > > + report_info("AFFINITY_INFO incorrectly reports CPU%d as offline", cpu); > > > + failed = true; > > > + } > > > + } > > > + > > > + return !failed; > > > +} > > > + > > > +int main(int argc, char **argv) I just noticed we're adding argc,argv in this patch, but not using them. > > > { > > > int ver = psci_invoke(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0); > > > + bool cpu_on_success = true; > > > > > > report_prefix_push("psci"); > > > > > > @@ -188,10 +251,17 @@ int main(void) > > > report(psci_affinity_info_on(), "affinity-info-on"); > > > report(psci_affinity_info_off(), "affinity-info-off"); > > > > > > - if (ERRATA(6c7a5dce22b3)) > > > - report(psci_cpu_on_test(), "cpu-on"); > > > - else > > > + if (ERRATA(6c7a5dce22b3)) { > > > + cpu_on_success = psci_cpu_on_test(); > > > + report(cpu_on_success, "cpu-on"); > > > + } else { > > > report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable."); > > > + } > > > + > > > + if (!cpu_on_success) > > > + report_skip("Skipping cpu-off test because the cpu-on test failed"); > > > > We should output "was skipped" when the cpu-on test was skipped, rather > > than always reporting "failed". We need two booleans, try_cpu_on_test and > > cpu_on_success. > > This is not about cpu-on being a precondition for cpu-off. cpu-off makes > only one assumption, and that is that all secondaries can be onlined > successfully. Even if cpu-on is never run, cpu-off calls on_cpu_async, > which will online a secondary. This is safe even if the errata is not > present, because the errata is about concurrent CPU_ON calls for the same > VCPU, not for different VCPUs. > > The cpu-off test is skipped here because it can hang indefinitely if > onlining CPU1 was not successful: > > [..] > for_each_present_cpu(cpu) { > if (cpu == 0) > continue; > on_cpu_async(cpu, cpu_off_secondary_entry, NULL); > } > > cpumask_set_cpu(0, &cpu_off_done); > > cpu_off_start = 1; > report_info("waiting for the CPUs to be offlined..."); > while (!cpumask_full(&cpu_off_done)) // infinite loop if CPU1 > cpu_relax(); // cannot be onlined. > > Does that make sense? Should I add a comment to make it clear why cpu-off > is skipped when cpu-on fails? I missed that cpu_on_success was initialized to true. Seeing that now, I understand how the only time it's false is if the cpu-on test failed. When I thought it was initialized to false it had two ways to be false, failure or skip. I think it's a bit confusing to set a 'success' variable to true when the test is skipped. Also, we can relax the condition as to whether or not we try cpu-off by simply checking that all cpus, other than cpu0, are in idle. How about if (ERRATA(6c7a5dce22b3)) report(psci_cpu_on_test(), "cpu-on"); else report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable."); assert(!cpu_idle(0)); if (!ERRATA(6c7a5dce22b3) || cpumask_weight(&cpu_idle_mask) == nr_cpus - 1) report(psci_cpu_off_test(), "cpu-off"); else report_skip("Skipping cpu-off test because the cpu-on test failed"); Thanks, drew > > Thanks, > Alex > > > > > > + else > > > + report(psci_cpu_off_test(), "cpu-off"); > > > > > > done: > > > #if 0 > > > -- > > > 2.39.0 > > > > > > > Thanks, > > drew