On Fri, Feb 12, 2021 at 7:42 PM Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> wrote: > > Alexey Klimov <aklimov@xxxxxxxxxx> writes: > > int cpu_device_up(struct device *dev) > > Yeah, definitely better to do the wait here. > > > int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) > > { > > - int cpu, ret = 0; > > + struct device *dev; > > + cpumask_var_t mask; > > + int cpu, ret; > > + > > + if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) > > + return -ENOMEM; > > > > + ret = 0; > > cpu_maps_update_begin(); > > for_each_online_cpu(cpu) { > > if (topology_is_primary_thread(cpu)) > > @@ -2099,18 +2098,35 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) > > * called under the sysfs hotplug lock, so it is properly > > * serialized against the regular offline usage. > > */ > > - cpuhp_offline_cpu_device(cpu); > > + dev = get_cpu_device(cpu); > > + dev->offline = true; > > + > > + cpumask_set_cpu(cpu, mask); > > } > > if (!ret) > > cpu_smt_control = ctrlval; > > cpu_maps_update_done(); > > + > > + /* Tell user space about the state changes */ > > + for_each_cpu(cpu, mask) { > > + dev = get_cpu_device(cpu); > > + kobject_uevent(&dev->kobj, KOBJ_OFFLINE); > > + } > > + > > + free_cpumask_var(mask); > > return ret; > > } > > Hrm, should the dev manipulation be kept in one place, something like > this? The first section of comment seems problematic to me with regards to such move: * As this needs to hold the cpu maps lock it's impossible * to call device_offline() because that ends up calling * cpu_down() which takes cpu maps lock. cpu maps lock * needs to be held as this might race against in kernel * abusers of the hotplug machinery (thermal management). Cpu maps lock is released in cpu_maps_update_done() hence we will move dev->offline out of cpu maps lock. Maybe I misunderstood the comment and it relates to calling cpu_down_maps_locked() under lock to avoid race? I failed to find the abusers of hotplug machinery in drivers/thermal/* to track down the logic of potential race but I might have overlooked. Anyway, if we move the update of dev->offline out, then it makes sense to restore cpuhp_{offline,online}_cpu_device back and just use it. I guess I'll update and re-send the patch and see how it goes. > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 8817ccdc8e112..aa21219a7b7c4 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -2085,11 +2085,20 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) > ret = cpu_down_maps_locked(cpu, CPUHP_OFFLINE); > if (ret) > break; > + > + cpumask_set_cpu(cpu, mask); > + } > + if (!ret) > + cpu_smt_control = ctrlval; > + cpu_maps_update_done(); > + > + /* Tell user space about the state changes */ > + for_each_cpu(cpu, mask) { > /* > - * As this needs to hold the cpu maps lock it's impossible > + * When the cpu maps lock was taken above it was impossible > * to call device_offline() because that ends up calling > * cpu_down() which takes cpu maps lock. cpu maps lock > - * needs to be held as this might race against in kernel > + * needed to be held as this might race against in kernel > * abusers of the hotplug machinery (thermal management). > * > * So nothing would update device:offline state. That would Yeah, reading how you re-phrased it, this seems to be about cpu_down_maps_locked()/device_offline() locks and race rather than updating stale dev->offline. Thank you, Alexey