Hi Dengcheng, On Fri, Sep 07, 2018 at 12:47:45PM -0700, Dengcheng Zhu wrote: > On cache issue, what if kexec_this_cpu() does local i$ flush and scache > invalidation? This is done only on CPU0, no IPI needed. And only CUP0 will use > reboot_code_buffer. In this way, marking other CPUs offline to prevent being > sent IPI is not needed. The problem is that the "only CPU0 will use reboot_code_buffer" part isn't true because CPU0 might not be the one that writes to it. Though perhaps if we moved the memcpy() from machine_kexec() to kexec_this_cpu() that could work. It still doesn't seem ideal to me that we'd have to even consider IPIs & the fact that we might silently hang if we accidentally send one. I really think it would be cleaner to mark CPUs offline & the problem is solved naturally - we shouldn't IPI these CPUs, and marking them offline means that we won't. Problem solved, and consistently with other architectures. > As to "the associated fragility of needing to avoid anything that might IPI > them", I hope that's not a problem as the kexec sequence is determined: The last > IPI received is to ask them to stand by on reboot signal, because other than > kexec there is no other kernel code beyond our control, which might send IPIs. But it does mean we can't call certain things, like __flush_cache_all(), which means that developers need to 1) know that and 2) remember to avoid any functions like this that perform an IPI behind the scenes. It just feels like loading a gun & leaving it pointed at our feet - by itself it's not necessarily going to cause a problem, but it would be oh so easy to trigger problems later. > > ...but the new kernel will have no knowledge of whether the old kernel > had CPUs marked online or offline, so I don't follow the argument? > > [Dengcheng] I meant the new kernel shouldn't need CPU online info from the old > kernel. The new one is doing a "fresh" boot, provided CPUs 1+ have been halted. > > Regarding the use of play_dead(), it's simple and generic - MIPS CPUs have > implemented their own play_dead(), which we can use to stop execution. Other > options may still come down to play_dead() to handle MT/MC details. The concern > of running it in parallel is really about the online status, which has been > explained above and in my last email. But play_dead() really doesn't mean what you're suggesting it means. What we want for kexec is that the CPU is stopped entirely - either halted or powered down. Failing that it needs to be running a loop from somewhere that we know isn't going to be overwritten, and the code it's running needs to be able to hand over to the new kernel later (eg. the existing kexec_smp_wait loop implements this latter approach). Whilst the smp-cps.c implementation of play_dead() will currently halt or power down the CPU, that's not universally true & doesn't necessarily even need to remain true for smp-cps.c in the future (eg. we could decide the halt/power down step would be cleaner to do in cps_cpu_die). A quick look at the other play_dead() implementations we have shows that both the cavium-octeon & loongson3 versions leave the CPU running a loop (within the current/old kernel), whilst the bmips version just sits at a wait instruction before jumping back to the fixed location of bmips_secondary_reentry (in the current/old kernel) once interrupted. None of those are suitable for kexec. In general the CPU hotplug code just needs that when play_dead() runs on the CPU going offline, and the struct plat_smp_ops cpu_die() callback runs on some other CPU, the CPU being offlined will be quiesced to a degree that it won't interfere with Linux. This is different to kexec where we need the CPU to stop in such a way that it's fine that we might overwrite the old kernel, and then be able to start it running the new kernel later. As described before, that means it either needs to halt or power down until the new kernel takes control of it or it needs to run some loop from some location that we know won't be overwritten. play_dead() is simple & already exists, it just doesn't do quite what we need kexec to do outside of the one system you're looking at. > A 'nicer' implementation stopping non-boot CPUs from one CPU might be available, > but handling code duplication of play_dead() and saving CPU states could be a > problem. Code that can be shared between play_dead() & kexec can still be shared if we have a different callback - the particular implementation of the two would just have both call a common function for the part that's common. > For now, no __flush_cache_all() in machine_kexec() + no marking offline in > kexec_this_cpu() + scache invalidation in kexec_this_cpu() might be a way to go. For the system you're interested in, sure that works. But as I've pointed out in my last few emails it does not work in general for other systems that we also need to support in the kernel. Thanks, Paul