Hi Dengcheng, On Thu, Sep 06, 2018 at 12:19:49PM -0700, Dengcheng Zhu wrote: > > I didn't apply patch 4 because I'm not sure it's correct & I believe the > > changes in the branch above should take care of it - CPUs that reach > > kexec_this_cpu() are maked offline, so they shouldn't be IPI'd by > > __flush_cache_all(). > > I believe patch 4 is necessary. As mentioned in the code comment and patch > comment of that patch, machine_crash_shutdown() is called prior to > machine_kexec() in the kdump sequence. So other CPUs have disabled local > IRQs waiting for the reboot signal. > > In fact, in kexec_this_cpu() [you renamed and modified kexec_smp_reboot()], > the added marking CPU offline will cause system hang (tested). This is > because it will change how play_dead() will work. So the problem is that patch 4 isn't really right either, and I suspect the hang you mention probably shows a problem with the whole play_dead() approach from patches 1 & 2 too. On the cache flushing, what we really need is for stores performed by the CPU running machine_kexec() via its dcache to be observed by the icache of the CPU that jumps into reboot_code_buffer, which is always CPU 0 after patch 2. Using local_flush_icache_range() will only ensure that the icache of the CPU running machine_kexec() observes the changes made by that same CPU, and does nothing with CPU 0 unless they happen to be the same CPU or siblings sharing caches. Consider I6x00 CPUs for example - patch 4 may *almost* work OK because both cpu_has_ic_fills_f_dc & cpu_icache_snoops_remote_store are true, so when the machine_kexec() CPU writes to reboot_code_buffer CPU 0's icache will observe that & will fill from dcache without needing it to be written back to L2. It's still not quite right because if CPU 0's icache already contained any valid lines within reboot_code_buffer (which could happen speculatively) then it'll execute stale/garbage code. The hang you mention is likely down to the fact that play_dead(), at least the smp-cps.c implementation of it, is written for the CPU hotplug infrastructure which will only operate on one CPU at a time. The loop looking for a sibling CPU just isn't going to be safe to run on multiple CPUs in parallel like patch 2 does. That's true of either your original patch or my modification - it's just that my modified patch will cause siblings to race towards powering off the core, whilst your original will cause them to all halt themselves & never power off the core. Which inclines me to return to my earlier thought that perhaps we shouldn't use play_dead() for this at all. > > The CPU that runs machine_kexec() should still > > flush its dcache (& the L2), and then CPU 0 invalidates its icache in > > kexec_this_cpu() prior to jumping into reboot_code_buffer. > > > > I'm also still not sure about patch 6 - since no platforms besides the > > arch/mips/generic/ make use of the UHI boot code yet I think it'd be > > best to leave as-is. If we do ever need to use it from another platform > > then we can deal with the problem then. If an out of tree platform needs > > to use it then for now it could copy generic_kexec_prepare() and deal > > with removing the duplication when it heads upstream. > > Understood. It really depends on how this problem is viewed. If patch #6 > is considered creating a framework for future UHI platforms, then it has > the following facts: > > * It doesn't create code redundancy. I mean, it does not add unnecessary > code to the kernel. > > * Out of tree platforms will get access to this functionality by a simple > "select UHI_BOOT". When the kernel developer of an out-of-tree platform > wants to make kexec work, they will naturally look at machine_kexec.c, > where they will find this UHI stuff, obviously telling them to do > "select UHI_BOOT". Otherwise, unless they google onto this discussion > thread, it's harder to know the solution to the kexec_args related > problem hides in the code of another platform (Generic). > > * It simplifies work if the out of tree platform wants to upstream. Well, ideally future platforms will just use arch/mips/generic rather than adding more platform/board code at all. Right now the only reason not to is if a system has a memory map that doesn't fit nicely, which should be resolved at some point by mapping the kernel which will allow the generic kernel to better support differing physical address maps. So I hope we don't add more platform code anyway, which would make all this moot. I certainly don't want to encourage adding more by explicitly making it easier to do. And if there does come a point where we need to add more platform code for some good reason then we can deal with this at that point rather than speculating now. For out of tree platforms, I don't think that copying the generic_kexec_prepare() function is particularly onerous anyway, and should be trivial to remove if such code is submitted upstream. Whilst I wouldn't want to make submitting code upstream more difficult, it's likely to need some amount of rework if submitted upstream anyway so this doesn't seem like a big deal. I don't think upstream should be particularly concerned with making life easy for out of tree code - if one wants the benefit of their code being taken into account upstream, then one should submit one's code upstream. In the paraphrased words of my pastor at a marriage prep class - no benefits (except those already conferred by free access to open source software) without commitment. Thanks, Paul