On 06/04/21 10:49, Mark Rutland wrote: > On Thu, Jun 03, 2021 at 06:40:57PM +0100, Will Deacon wrote: > > On Thu, Jun 03, 2021 at 01:58:56PM +0100, Mark Rutland wrote: > > > On Wed, Jun 02, 2021 at 05:47:15PM +0100, Will Deacon wrote: > > > > If we want to support 32-bit applications, then when we identify a CPU > > > > with mismatched 32-bit EL0 support we must ensure that we will always > > > > have an active 32-bit CPU available to us from then on. This is important > > > > for the scheduler, because is_cpu_allowed() will be constrained to 32-bit > > > > CPUs for compat tasks and forced migration due to a hotplug event will > > > > hang if no 32-bit CPUs are available. > > > > > > > > On detecting a mismatch, prevent offlining of either the mismatching CPU > > > > if it is 32-bit capable, or find the first active 32-bit capable CPU > > > > otherwise. > > > > > > > > Reviewed-by: Catalin Marinas <catalin.marinas@xxxxxxx> > > > > Signed-off-by: Will Deacon <will@xxxxxxxxxx> > > > > --- > > > > arch/arm64/kernel/cpufeature.c | 20 +++++++++++++++++++- > > > > 1 file changed, 19 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > > > > index 4194a47de62d..b31d7a1eaed6 100644 > > > > --- a/arch/arm64/kernel/cpufeature.c > > > > +++ b/arch/arm64/kernel/cpufeature.c > > > > @@ -2877,15 +2877,33 @@ void __init setup_cpu_features(void) > > > > > > > > static int enable_mismatched_32bit_el0(unsigned int cpu) > > > > { > > > > + static int lucky_winner = -1; > > > > > > This is cute, but could we please give it a meaningful name, e.g. > > > `pinned_cpu` ? > > > > I really don't see the problem, nor why it's "cute". > > > > Tell you what, I'll add a comment instead: > > > > /* > > * The first 32-bit-capable CPU we detected and so can no longer > > * be offlined by userspace. -1 indicates we haven't yet onlined > > * a 32-bit-capable CPU. > > */ > > Thanks for the comment; that's helpful. > > However, my concern here is that when we inevitably have to discuss this > with others in future, "lucky winner" is jarring (and also unclear to > those where English is not their native language). For clarity, it would > be really nice to use a term like "cpu", "chosen_cpu", "pinned_cpu", > etc. > > However, you're the maintainer; choose what you think is appropriate. > > > > > struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu); > > > > bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0); > > > > > > > > if (cpu_32bit) { > > > > cpumask_set_cpu(cpu, cpu_32bit_el0_mask); > > > > static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0); > > > > - setup_elf_hwcaps(compat_elf_hwcaps); > > > > } > > > > > > > > + if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit) > > > > + return 0; > > > > + > > > > + if (lucky_winner >= 0) > > > > + return 0; > > > > + > > > > + /* > > > > + * We've detected a mismatch. We need to keep one of our CPUs with > > > > + * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting > > > > + * every CPU in the system for a 32-bit task. > > > > + */ > > > > + lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask, > > > > + cpu_active_mask); > > > > + get_cpu_device(lucky_winner)->offline_disabled = true; > > > > + setup_elf_hwcaps(compat_elf_hwcaps); > > > > + pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n", > > > > + cpu, lucky_winner); > > > > return 0; > > > > } > > > > > > I guess this is going to play havoc with kexec and hibernate. :/ > > > > The kernel can still offline the CPUs (see the whole freezer mess that I > > linked to in the cover letter). What specific havoc are you thinking of? > > Ah. If this is just inhibiting userspace-driven offlining, that sounds > fine. > > For kexec, I was concerned that either this would inhibit kexec, or > smp_shutdown_nonboot_cpus() would fail to offline the pinned CPU, and > that'd trigger a BUG(), which would be unfortunate. > > For hibernate, the equivalent is freeze_secondary_cpus(), which I guess > is dealt with by the freezer bits you mention. ()->offline_disabled will only block offline requests performed by device_offline(). kexec, hibernate, suspend/resume use cpu_online/offline() directly so won't be impacted by that. I have sent patches that make cpu_online/offline() 'private' and not used or exported outside of cpu subsystem and the odd support function for arch code. All other users use device_offline() or add/remove_cpu() now. I have made sure to test kexec, suspend to disk and ram in my older similar implementation in the past. So we should be good. Cheers -- Qais Yousef