Re: Calxeda Midway crashes on boot with KVM on 3.14-rc

Rob Herring <rob.herring@xxxxxxxxxx> · Wed, 19 Feb 2014 15:34:21 -0600

On Wed, Feb 19, 2014 at 9:41 AM, Lorenzo Pieralisi
<lorenzo.pieralisi@xxxxxxx> wrote:
> Hi Andre,
>
> On Wed, Feb 19, 2014 at 02:40:27PM +0000, Andre Przywara wrote:
>> Hi,
>>
>> on the Calxeda Midway box a 3.14-rc kernel compiled with KVM support
>> crashes during kernel boot:
>>
>> ...
>> [    3.663897] Kernel panic - not syncing: unexpected prefetch abort in
>> Hyp mode at: 0x685760
>> [    3.663897] unexpected data abort in Hyp mode at: 0xc067d150
>> [    3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0
>> [    3.663897]
>> [    3.663901] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.14.0-rc3 #118
>> [    3.663912] CPU1: stopping
>> [    3.663916] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.0-rc3 #118
>> [    3.663919] CPU0: stopping
>> [    3.663923] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc3 #118
>> [    3.744453] CPU2: stopping
>> [    3.747151] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc3 #118
>>
>>
>>  I traced it down to this commit:
>>
>> commit 1fcf7ce0c60213994269fb59569ec161eb6e08d6
>> Author: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>
>> Date:   Mon Aug 5 15:04:46 2013 +0100
>>
>>     arm: kvm: implement CPU PM notifier
>>
>> It seems like there is some address translation confusion if a CPU
>> comes out of suspend. I added some debug printks to shed some light on
>> the sequence of power management stuff going on.
>> The initial per-cpu KVM setup works fine:
>> [ 2.402059]  switch from the HYP stub to our own HYP init vector
>> [ 2.402060]  switch from the HYP stub to our own HYP init vector
>> [ 2.402062]  switch from the HYP stub to our own HYP init vector
>> [ 2.402065]  __hyp_set_vectors(685760)
>> [ 2.402067]  __hyp_set_vectors(685760)
>> [ 2.402071]  __hyp_set_vectors() done
>> [ 2.402073]  __hyp_set_vectors() done
>> [ 2.444884]  __hyp_set_vectors(685760)
>> [ 2.451486]  __hyp_set_vectors() done
>> [ 2.456007]  switch from the HYP stub to our own HYP init vector
>> [ 2.462871]  __hyp_set_vectors(685760)
>> [ 2.469473]  __hyp_set_vectors() done
>> [ 2.474035] kvm [1]: interrupt-controller@fff14000 IRQ25
>> [ 2.479411] kvm [1]: timer IRQ27
>> [ 2.482636] kvm [1]: Hyp mode initialized successfully
>>
>> Also the new notifier registration goes fine:
>> [ 2.487773]  calling hyp_cpu_pm_init() (address: c001460c)
>> [ 2.494121]  calling cpu_pm_register_notifier(c08c3dc8)
>> [ 2.502199]  cpu_pm_register_notifier(fn=0xc001465c);
>> [ 2.507157]  cpu_pm_register_notifier() finished
>> [ 2.512631]  hyp_cpu_pm_init() returned
>>
>> Then later on (but still without userland) obviously CPU3 goes to sleep
>> already (cmd=0) and seems to be woken up again (cmd=2) immediately:
>>
>> [ 3.643923]  hyp_init_cpu_pm_notifier(self=c08c3dc8,cmd=0,v=(null)),CPU: 3
>> [ 3.643925]  hyp_init_cpu_pm_notifier returns NOTIFY_DONE
>> [ 3.643933]  psci_cpu_suspend(state.id=0,state.type=1, entry=0x8c44bc);
>> [ 3.663884]  hyp_init_cpu_pm_notifier(self=c08c3dc8,cmd=2,v=(null)),CPU: 3
>> [ 3.663886]  calling cpu_init_hyp_mode(NULL) (address=c0014518)
>> [ 3.663888]  switch from the HYP stub to our own HYP init vector
>> [ 3.663890]  __hyp_set_vectors(685760)
>> [ 3.663897] Kernel panic - not syncing: unexpected prefetch abort in
>> Hyp mode at: 0x685760
>> [ 3.663897] unexpected data abort in Hyp mode at: 0xc067d150
>> [ 3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0
>>
>> Obviously it tries to access the physical address. I wonder if there is
>> something missing in the Calxeda PSCI firmware?
>>
>> Has anyone a clue what could go on here?
>> Unfortunately it crashes every time at boot if KVM and CONFIG_PM is
>> configured - and it worked fine with 3.13.
>
> I think the problem is: the CPU idle driver calls psci_cpu_suspend, that
> fails or does not return from reset, so that the CPU state is not lost
> on resume but the CPU PM notifier is triggered anyway and that triggers
> this issue.

Indeed. As we never got powergating to work right for cpuidle, the
PSCI implementation is simply a wfi.

>
> Can you give the following patch a go please ? (whether it is a proper
> fix or not it has to be seen).
>
> Thanks,
> Lorenzo
>
> -- >8 --
> Subject: [PATCH] drivers: cpuidle: calxeda: fix CPU PM notifier usage upon
>  suspend return
>
> CPU PM notifiers are meant to be triggered when a CPU is reset
> after entering a power down procedure so that peripheral CPU state
> can be saved and restored. If the power down fails, state is not lost
> and cpu_pm_notifier must not be triggered on exit.
>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>
> ---
>  drivers/cpuidle/cpuidle-calxeda.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle-calxeda.c b/drivers/cpuidle/cpuidle-calxeda.c
> index 6e51114..5034f7a 100644
> --- a/drivers/cpuidle/cpuidle-calxeda.c
> +++ b/drivers/cpuidle/cpuidle-calxeda.c
> @@ -41,9 +41,12 @@ static int calxeda_pwrdown_idle(struct cpuidle_device *dev,
>                                 struct cpuidle_driver *drv,
>                                 int index)
>  {
> +       int ret;
> +
>         cpu_pm_enter();
> -       cpu_suspend(0, calxeda_idle_finish);
> -       cpu_pm_exit();
> +       ret = cpu_suspend(0, calxeda_idle_finish);
> +       if (!ret)
> +               cpu_pm_exit();

It seems a little strange that the enter does not have to be balanced
with an exit call. Couldn't the enter tear down things that need to be
re-enabled?

Rob
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm