On Tue, May 10, 2022 at 10:28:11AM +0200, Thomas Gleixner wrote: > On Tue, May 10 2022 at 11:38, Pingfan Liu wrote: > > On Mon, May 09, 2022 at 12:55:21PM +0200, Thomas Gleixner wrote: > >> On Mon, May 09 2022 at 12:13, Pingfan Liu wrote: > >> > The following code chunk repeats in both > >> > migrate_to_reboot_cpu() and smp_shutdown_nonboot_cpus(): > >> > This is due to a breakage like the following: > >> > >> I don't see what's broken here. > >> > > > > No, no broken. Could it be better to replace 'breakage' with > > 'breakin'? > > There is no break-in. There is a phase where CPU hotplug is reenabled, > which might be avoided. > OK, I will rephrase like this. > >> > +/* primary_cpu keeps unchanged after migrate_to_reboot_cpu() */ > >> > >> This comment makes no sense. > >> > > > > Since migrate_to_reboot_cpu() disables cpu hotplug, so the selected > > valid online cpu -- primary_cpu keeps unchange. > > So what is that parameter for then? If migrate_to_reboot_cpu() ensured > that the current task is on the reboot CPU then this parameter is > useless, no? > Yes, it is useless after this patch. I will post V4 to kill it. > >> > void smp_shutdown_nonboot_cpus(unsigned int primary_cpu) > >> > { > >> > unsigned int cpu; > >> > int error; > >> > > >> > + /* > >> > + * Block other cpu hotplug event, so primary_cpu is always online if > >> > + * it is not touched by us > >> > + */ > >> > cpu_maps_update_begin(); > >> > - > >> > /* > >> > - * Make certain the cpu I'm about to reboot on is online. > >> > - * > >> > - * This is inline to what migrate_to_reboot_cpu() already do. > >> > + * migrate_to_reboot_cpu() disables CPU hotplug assuming that > >> > + * no further code needs to use CPU hotplug (which is true in > >> > + * the reboot case). However, the kexec path depends on using > >> > + * CPU hotplug again; so re-enable it here. > >> > >> You want to reduce confusion, but in reality this is even more confusing > >> than before. > >> > > > > This __cpu_hotplug_enable() can be considered to defer from kernel_kexec() to > > arch-dependent code chunk (here), which is a more proper point. > > > > Could it make things better by rephrasing the words as the following? > > migrate_to_reboot_cpu() disables CPU hotplug to prevent the selected > > reboot cpu from disappearing. But arches need cpu_down to hot remove > > cpus except rebooting-cpu, so re-enabling cpu hotplug again. > > Can you please use proper words. arches is not a word and it's closer to > the plural of arch, than to the word architecture. This is not twitter. > OK, I will correct it. > And no, the architectures do not need cpu_down() at all. This very > function smp_shutdown_nonboot_cpus() invokes cpu_down_maps_locked() to > shut down the non boot CPUs. That fails when cpu_hotplug_disabled != 0. > Yes. I will pay attention to the accuracy of the description. > >> > */ > >> > - if (!cpu_online(primary_cpu)) > >> > - primary_cpu = cpumask_first(cpu_online_mask); > >> > + __cpu_hotplug_enable(); > >> > >> How is this decrement solving anything? At the end of this function, the > >> counter is incremented again. So what's the point of this exercise? > >> > > This decrement enables the cpu hot-removing. Since > > smp_shutdown_nonboot_cpus()->cpu_down_maps_locked(), if > > cpu_hotplug_disabled, it returns -EBUSY. > > Correct, so why can't you spell that out in concise words in the first > place right at that comment which reenables hotplug? > OK, thanks for the suggestion. > >> What does that for arch/powerpc/kernel/kexec_machine64.c now? > >> > >> Nothing, as far as I can tell. Which means you basically reverted > >> 011e4b02f1da ("powerpc, kexec: Fix "Processor X is stuck" issue during > >> kexec from ST mode") unless I'm completely confused. > >> > > > > Oops. Forget about powerpc. Considering the cpu hotplug is an > > arch-dependent feature in machine_shutdown(), as x86 does not need it. > > It's not a feature, it's a architecture specific requirement. x86 is > irrelevant here because this is a powerpc requirement. > Yes. > >> This is tinkering at best. Can we please sit down and rethink this whole > >> machinery instead of applying random duct tape to it? > >> > > I try to make code look consistent. > > Emphasis on try. So far the attempt failed and resulted in a regression. > I will fix the powerpc issue and post V4 after a test. Thanks for your precious time. Best Regards, Pingfan _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec