On Thu, 25 Apr 2024 13:31:50 +0100 Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > On Wed, 24 Apr 2024 16:33:22 +0100 > Marc Zyngier <maz@xxxxxxxxxx> wrote: > > > On Wed, 24 Apr 2024 13:54:38 +0100, > > Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > > > > > > On Tue, 23 Apr 2024 13:01:21 +0100 > > > Marc Zyngier <maz@xxxxxxxxxx> wrote: > > > > > > > On Mon, 22 Apr 2024 11:40:20 +0100, > > > > Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > > > > > > > > > > On Thu, 18 Apr 2024 14:54:07 +0100 > > > > > Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> wrote: > > > > [...] > > > > > > > > > > > > > + /* > > > > > > + * Capable but disabled CPUs can be brought online later. What about > > > > > > + * the redistributor? ACPI doesn't want to say! > > > > > > + * Virtual hotplug systems can use the MADT's "always-on" GICR entries. > > > > > > + * Otherwise, prevent such CPUs from being brought online. > > > > > > + */ > > > > > > + if (!(gicc->flags & ACPI_MADT_ENABLED)) { > > > > > > + pr_warn_once("CPU %u's redistributor is inaccessible: this CPU can't be brought online\n", cpu); > > > > > > + set_cpu_present(cpu, false); > > > > > > + set_cpu_possible(cpu, false); > > > > > > + return 0; > > > > > > + } > > > > > > > > It seems dangerous to clear those this late in the game, given how > > > > disconnected from the architecture code this is. Are we sure that > > > > nothing has sampled these cpumasks beforehand? > > > > > > Hi Marc, > > > > > > Any firmware that does this is being considered as buggy already > > > but given it is firmware and the spec doesn't say much about this, > > > there is always the possibility. > > > > There is no shortage of broken firmware out there, and I expect this > > trend to progress. > > > > > Not much happens between the point where these are setup and > > > the point where the the gic inits and this code runs, but even if careful > > > review showed it was fine today, it will be fragile to future changes. > > > > > > I'm not sure there is a huge disadvantage for such broken firmware in > > > clearing these masks from the point of view of what is used throughout > > > the rest of the kernel. Here I think we are just looking to prevent the CPU > > > being onlined later. > > > > I totally agree on the goal, I simply question the way you get to it. > > > > > > > > We could add a set_cpu_broken() with appropriate mask. > > > Given this is very arm64 specific I'm not sure Rafael will be keen on > > > us checking such a mask in the generic ACPI code, but we could check it in > > > arch_register_cpu() and just not register the cpu if it matches. > > > That will cover the vCPU hotplug case. > > > > > > Does that sounds sensible, or would you prefer something else? > > > > > > Such a 'broken_rdists' mask is exactly what I have in mind, just > > keeping it private to the GIC driver, and not expose it anywhere else. > > You can then fail the hotplug event early, and avoid changing the > > global masks from within the GIC driver. At least, we don't mess with > > the internals of the kernel, and the CPU is properly marked as dead > > (that mechanism should already work). > > > > I'd expect the handling side to look like this (will not compile, but > > you'll get the idea): > Hi Marc, > > In general this looks good - but... > > I haven't gotten to the bottom of why yet (and it might be a side > effect of how I hacked the test by lying in minimal fashion and > just frigging the MADT read functions) but the hotplug flow is only getting > as far as calling __cpu_up() before it seems to enter an infinite loop. > That is it never gets far enough to fail this test. > > Getting stuck in a psci cpu_on call. I'm guessing something that > we didn't get to in the earlier gicv3 calls before bailing out is blocking that? > Looks like it gets to > SMCCC smc > and is never seen again. > > Any ideas on where to look? The one advantage so far of the higher level > approach is we never tried the hotplug callbacks at all so avoided hitting > that call. One (little bit horrible) solution that might avoid this would > be to add another cpuhp state very early on and fail at that stage. > I'm not keen on doing that without a better explanation than I have so far! Whilst it still doesn't work I suspect I'm loosing ability to print to the console between that point and somewhat later and real problem is elsewhere. Jonathan > > Thanks, > > J > > > > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c > > index 6fb276504bcc..e8f02bfd0e21 100644 > > --- a/drivers/irqchip/irq-gic-v3.c > > +++ b/drivers/irqchip/irq-gic-v3.c > > @@ -1009,6 +1009,9 @@ static int __gic_populate_rdist(struct redist_region *region, void __iomem *ptr) > > u64 typer; > > u32 aff; > > > > + if (cpumask_test_cpu(smp_processor_id(), &broken_rdists)) > > + return 1; > > + > > /* > > * Convert affinity to a 32bit value that can be matched to > > * GICR_TYPER bits [63:32]. > > @@ -1260,14 +1263,15 @@ static int gic_dist_supports_lpis(void) > > !gicv3_nolpi); > > } > > > > -static void gic_cpu_init(void) > > +static int gic_cpu_init(void) > > { > > void __iomem *rbase; > > - int i; > > + int ret, i; > > > > /* Register ourselves with the rest of the world */ > > - if (gic_populate_rdist()) > > - return; > > + ret = gic_populate_rdist(); > > + if (ret) > > + return ret; > > > > gic_enable_redist(true); > > > > @@ -1286,6 +1290,8 @@ static void gic_cpu_init(void) > > > > /* initialise system registers */ > > gic_cpu_sys_reg_init(); > > + > > + return 0; > > } > > > > #ifdef CONFIG_SMP > > @@ -1295,7 +1301,11 @@ static void gic_cpu_init(void) > > > > static int gic_starting_cpu(unsigned int cpu) > > { > > - gic_cpu_init(); > > + int ret; > > + > > + ret = gic_cpu_init(); > > + if (ret) > > + return ret; > > > > if (gic_dist_supports_lpis()) > > its_cpu_init(); > > > > But the question is: do you rely on these masks having been > > "corrected" anywhere else? > > > > Thanks, > > > > M. > > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel