Re: [PATCH v3 1/6] MIPS: Make play_dead() work for kexec

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paul,


Thanks for reviewing. Please see my comments below.


On 07/24/2018 04:23 PM, Paul Burton wrote:
Hi Dengcheng,

On Mon, Jul 23, 2018 at 07:48:14AM -0700, Dengcheng Zhu wrote:
Extract play_dead() from CONFIG_HOTPLUG_CPU and share with CONFIG_KEXEC.
Also, add one parameter to it to avoid doing unnecessary things in the case
of kexec.
I'd prefer that we use a separate function to play_dead() for this, for
example we could provide an implementation of crash_smp_send_stop() much
like ARM's which invokes a machine_crash_nonpanic_core() function on all
CPUs other than the crash CPU.

ARM's crash_smp_send_stop() that calls machine_crash_nonpanic_core() is called
in machine_crash_shutdown(), not in machine_kexec(). It's similar in the MIPS
case - default_machine_crash_shutdown().


This would prevent the kexec/kdump functionality from depending on the
board/platform specific play_dead(), and wouldn't need these changes to
all of the implementations of play_dead().

The revised play_dead() is JUST to make sure we are turning off CPUs cleanly.
This function itself already hides board/platform details. So it seems a good
candidate for turning off CPUs for the target platform.

This function is called only in the newly created kexec_smp_reboot(), before
which cpu states have been saved.


We should also be calling crash_save_cpu() on each CPU, which is a
further difference from play_dead().

crash_save_cpu() is already called in machine_crash_shutdown(), which is
prior to machine_kexec().


This is needed to correctly support SMP new kernel in kexec. Before doing
this, all non-crashing CPUs are waiting for the reboot signal
(kexec_ready_to_reboot) to jump to kexec_start_address in kexec_smp_wait(),
which creates some problems like incorrect CPU topology upon booting from
new UP kernel,
Do you know how that happens? I'd expect detecting topology not to
depend upon what state CPUs are in. That should certainly be the case
for smp-cps/CONFIG_MIPS_CPS which detects topology just by reading
CM/CPC/GIC registers.

I didn't debug the topology issue. But it's related to the attempt of *all*
CPUs using the same code from kexec_start_address, thinking they are
dominating the system.

The cleanest way IMO is simple and what this patch is trying to do - turn off
CPUs and do a fresh SMP boot from c0v0.



sluggish performance in MT environment during and after reboot,
The function running on non-crash CPUs would just need to execute a loop
of wait instructions to avoid this.

Same as the topology problem, I didn't look into this performance issue. But
I did try using 'wait' on non-crash CPUs - it didn't work well. In the clean
way, this problem disappears as expected.


new SMP kernel not able to bring up secondary CPU etc.
If the SMP implementation can reset CPUs then that ought not to happen,
since no matter what the CPU was doing Linux should be able to cause it
to reset & run some known piece of code. I'm not sure the current Octeon
SMP code can do that, but there are patches in patchwork that look like
they might (& patches to remove Octeon's current kexec/kdump code which
suggests nobody cares much about it).

I'd suggest we could perhaps add a boolean to struct plat_smp_ops to
indicate whether kexec is supported, and start by setting it to true for
cps_smp_ops. Then we can have machine_kexec_prepare() return an error if
it finds !mp_ops->kexec_supported, and deal with enabling kexec per
platform. I think this would be better than Kconfig because there are
systems where we may use one of multiple SMP implementations - for
example Malta might use smp-cps (which would be OK for kexec) or smp-cmp
(which wouldn't). If we get to a point where all our SMP implementations
can deal with kexec we could remove the field later.

This problem is also related to the original kexec boot mechanism. No
matter what SMP implementation it is, it should be good if we correctly turn
off CPUs and reboot the (SMP) system from the 1st CPU in its own
implementation.


Regards,

Dengcheng

---------------------------------------------------------------------------

*From:* Paul Burton <mailto:paul.burton@xxxxxxxx>
*Sent:* Tuesday, July 24, 2018 4:23PM
*To:* Dengcheng Zhu <mailto:dzhu@xxxxxxxxxxxx>
*Cc:* Pburton <mailto:pburton@xxxxxxxxxxxx>, Ralf <mailto:ralf@xxxxxxxxxxxxxx>, Linux-mips <mailto:linux-mips@xxxxxxxxxxxxxx>, Rachel.mozes <mailto:rachel.mozes@xxxxxxxxx>
*Subject:* Re: [PATCH v3 1/6] MIPS: Make play_dead() work for kexec

Hi Dengcheng,

On Mon, Jul 23, 2018 at 07:48:14AM -0700, Dengcheng Zhu wrote:

Extract play_dead() from CONFIG_HOTPLUG_CPU and share with CONFIG_KEXEC.
Also, add one parameter to it to avoid doing unnecessary things in the case
of kexec.

I'd prefer that we use a separate function to play_dead() for this, for
example we could provide an implementation of crash_smp_send_stop() much
like ARM's which invokes a machine_crash_nonpanic_core() function on all
CPUs other than the crash CPU.

This would prevent the kexec/kdump functionality from depending on the
board/platform specific play_dead(), and wouldn't need these changes to
all of the implementations of play_dead().

We should also be calling crash_save_cpu() on each CPU, which is a
further difference from play_dead().

This is needed to correctly support SMP new kernel in kexec. Before doing
this, all non-crashing CPUs are waiting for the reboot signal
(kexec_ready_to_reboot) to jump to kexec_start_address in kexec_smp_wait(),
which creates some problems like incorrect CPU topology upon booting from
new UP kernel,

Do you know how that happens? I'd expect detecting topology not to
depend upon what state CPUs are in. That should certainly be the case
for smp-cps/CONFIG_MIPS_CPS which detects topology just by reading
CM/CPC/GIC registers.

sluggish performance in MT environment during and after reboot,

The function running on non-crash CPUs would just need to execute a loop
of wait instructions to avoid this.

new SMP kernel not able to bring up secondary CPU etc.

If the SMP implementation can reset CPUs then that ought not to happen,
since no matter what the CPU was doing Linux should be able to cause it
to reset & run some known piece of code. I'm not sure the current Octeon
SMP code can do that, but there are patches in patchwork that look like
they might (& patches to remove Octeon's current kexec/kdump code which
suggests nobody cares much about it).

I'd suggest we could perhaps add a boolean to struct plat_smp_ops to
indicate whether kexec is supported, and start by setting it to true for
cps_smp_ops. Then we can have machine_kexec_prepare() return an error if
it finds !mp_ops->kexec_supported, and deal with enabling kexec per
platform. I think this would be better than Kconfig because there are
systems where we may use one of multiple SMP implementations - for
example Malta might use smp-cps (which would be OK for kexec) or smp-cmp
(which wouldn't). If we get to a point where all our SMP implementations
can deal with kexec we could remove the field later.

Thanks,
    Paul






[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux