On Tue, Nov 30, 2021 at 2:22 PM Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote: > > On Wed, Nov 24, 2021 at 03:48:49PM +0100, Rafael J. Wysocki wrote: > > On Sat, Nov 6, 2021 at 2:34 AM Ricardo Neri > > <ricardo.neri-calderon@xxxxxxxxxxxxxxx> wrote: [cut] > > > +/** > > > + * intel_hfi_offline() - Disable HFI on @cpu > > > + * @cpu: CPU in which the HFI will be disabled > > > + * > > > + * Remove @cpu from those covered by its HFI instance. > > > + * > > > + * On some processors, hardware remembers previous programming settings even > > > + * after being reprogrammed. Thus, keep HFI enabled even if all CPUs in the > > > + * die/package of @cpu are offline. See note in intel_hfi_online(). > > > + */ > > > +void intel_hfi_offline(unsigned int cpu) > > > +{ > > > + struct cpumask *die_cpumask = topology_core_cpumask(cpu); > > > + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, cpu); > > > + struct hfi_instance *hfi_instance; > > > + > > > + if (!boot_cpu_has(X86_FEATURE_INTEL_HFI)) > > > + return; > > > + > > > + hfi_instance = info->hfi_instance; > > > + if (!hfi_instance) > > > + return; > > > + > > > + if (!hfi_instance->initialized) > > > + return; > > > + > > > + mutex_lock(&hfi_lock); > > > + > > > + /* > > > + * We were using the core cpumask of @cpu to track CPUs in the same > > > + * die/package. Now it is going offline and we need to find another > > > + * CPU we can use. > > > + */ > > > + if (die_cpumask == hfi_instance->cpus) { > > > + int new_cpu; > > > + > > > + new_cpu = cpumask_any_but(hfi_instance->cpus, cpu); > > > + if (new_cpu >= nr_cpu_ids) > > > + /* All other CPUs in the package are offline. */ > > > + hfi_instance->cpus = NULL; > > > + else > > > + hfi_instance->cpus = topology_core_cpumask(new_cpu); > > > > Hmmm. Is topology_core_cpumask() updated when CPUs go offline and online? > > Yes. A CPU going offline is cleared from its siblings' cpumask [1] and its own [2] > in remove_siblinginfo() via cpu_disable_common(). A CPU going online is set > in its siblings' cpumask and its own in set_cpu_sibling_map() [3]. OK, so it is necessary to ensure that intel_hfi_offline() will always run after remove_siblinginfo() so it sees the updated mask. How do we ensure that? > [1]. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/smpboot.c?h=v5.16-rc3#n1592 > [2]. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/smpboot.c?h=v5.16-rc3#n1617 > [3]. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/smpboot.c?h=v5.16-rc3#n657