Re: [PATCH v4 0/6] MIPS: kexec/kdump: Fix smp reboot and other issues

Paul Burton <paul.burton@xxxxxxxx> · Thu, 6 Sep 2018 13:34:55 -0700

Hi Dengcheng,

On Thu, Sep 06, 2018 at 12:19:49PM -0700, Dengcheng Zhu wrote:
> > I didn't apply patch 4 because I'm not sure it's correct & I believe the
> > changes in the branch above should take care of it - CPUs that reach
> > kexec_this_cpu() are maked offline, so they shouldn't be IPI'd by
> > __flush_cache_all().
> 
> I believe patch 4 is necessary. As mentioned in the code comment and patch
> comment of that patch, machine_crash_shutdown() is called prior to
> machine_kexec() in the kdump sequence. So other CPUs have disabled local
> IRQs waiting for the reboot signal.
> 
> In fact, in kexec_this_cpu() [you renamed and modified kexec_smp_reboot()],
> the added marking CPU offline will cause system hang (tested). This is
> because it will change how play_dead() will work.

So the problem is that patch 4 isn't really right either, and I suspect
the hang you mention probably shows a problem with the whole play_dead()
approach from patches 1 & 2 too.

On the cache flushing, what we really need is for stores performed by
the CPU running machine_kexec() via its dcache to be observed by the
icache of the CPU that jumps into reboot_code_buffer, which is always
CPU 0 after patch 2. Using local_flush_icache_range() will only ensure
that the icache of the CPU running machine_kexec() observes the changes
made by that same CPU, and does nothing with CPU 0 unless they happen to
be the same CPU or siblings sharing caches.

Consider I6x00 CPUs for example - patch 4 may *almost* work OK because
both cpu_has_ic_fills_f_dc & cpu_icache_snoops_remote_store are true, so
when the machine_kexec() CPU writes to reboot_code_buffer CPU 0's icache
will observe that & will fill from dcache without needing it to be
written back to L2. It's still not quite right because if CPU 0's icache
already contained any valid lines within reboot_code_buffer (which could
happen speculatively) then it'll execute stale/garbage code.

The hang you mention is likely down to the fact that play_dead(), at
least the smp-cps.c implementation of it, is written for the CPU hotplug
infrastructure which will only operate on one CPU at a time. The loop
looking for a sibling CPU just isn't going to be safe to run on multiple
CPUs in parallel like patch 2 does. That's true of either your original
patch or my modification - it's just that my modified patch will cause
siblings to race towards powering off the core, whilst your original
will cause them to all halt themselves & never power off the core.

Which inclines me to return to my earlier thought that perhaps we
shouldn't use play_dead() for this at all.

> > The CPU that runs machine_kexec() should still
> > flush its dcache (& the L2), and then CPU 0 invalidates its icache in
> > kexec_this_cpu() prior to jumping into reboot_code_buffer.
> > 
> > I'm also still not sure about patch 6 - since no platforms besides the
> > arch/mips/generic/ make use of the UHI boot code yet I think it'd be
> > best to leave as-is. If we do ever need to use it from another platform
> > then we can deal with the problem then. If an out of tree platform needs
> > to use it then for now it could copy generic_kexec_prepare() and deal
> > with removing the duplication when it heads upstream.
> 
> Understood. It really depends on how this problem is viewed. If patch #6
> is considered creating a framework for future UHI platforms, then it has
> the following facts:
> 
> * It doesn't create code redundancy. I mean, it does not add unnecessary
>   code to the kernel.
> 
> * Out of tree platforms will get access to this functionality by a simple
>   "select UHI_BOOT". When the kernel developer of an out-of-tree platform
>   wants to make kexec work, they will naturally look at machine_kexec.c,
>   where they will find this UHI stuff, obviously telling them to do
>   "select UHI_BOOT". Otherwise, unless they google onto this discussion
>   thread, it's harder to know the solution to the kexec_args related
>   problem hides in the code of another platform (Generic).
> 
> * It simplifies work if the out of tree platform wants to upstream.

Well, ideally future platforms will just use arch/mips/generic rather
than adding more platform/board code at all. Right now the only reason
not to is if a system has a memory map that doesn't fit nicely, which
should be resolved at some point by mapping the kernel which will allow
the generic kernel to better support differing physical address maps.

So I hope we don't add more platform code anyway, which would make all
this moot. I certainly don't want to encourage adding more by explicitly
making it easier to do. And if there does come a point where we need to
add more platform code for some good reason then we can deal with this
at that point rather than speculating now.

For out of tree platforms, I don't think that copying the
generic_kexec_prepare() function is particularly onerous anyway, and
should be trivial to remove if such code is submitted upstream. Whilst I
wouldn't want to make submitting code upstream more difficult, it's
likely to need some amount of rework if submitted upstream anyway so
this doesn't seem like a big deal.

I don't think upstream should be particularly concerned with making life
easy for out of tree code - if one wants the benefit of their code being
taken into account upstream, then one should submit one's code upstream.
In the paraphrased words of my pastor at a marriage prep class - no
benefits (except those already conferred by free access to open source
software) without commitment.

Thanks,
    Paul