Re: Coresight causes synchronous external abort on msm8916

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stephan

On 21/06/2019 17:06, Stephan Gerhold wrote:
Hi all,

Thanks for all your replies!

On Wed, Jun 19, 2019 at 02:16:38PM -0600, Mathieu Poirier wrote:
On Wed, 19 Jun 2019 at 12:39, Stephan Gerhold <stephan@xxxxxxxxxxx> wrote:

Hi,

On Wed, Jun 19, 2019 at 09:49:03AM +0100, Suzuki K Poulose wrote:
Hi Stephan,

On 18/06/2019 21:26, Stephan Gerhold wrote:
Hi,

I'm trying to run mainline Linux on a smartphone with MSM8916 SoC.
It works surprisingly well, but the coresight devices seem to cause the
following crash shortly after userspace starts:

      Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP

...



In this case I'm using a simple device tree similar to apq8016-sbc,
but it also happens using something as simple as msm8916-mtp.dts
on this particular device.
    (Attached: dmesg log with msm8916-mtp.dts and arm64 defconfig)

I can avoid the crash and boot without any further problems by disabling
every coresight device defined in msm8916.dtsi, e.g.:

     tpiu@820000 { status = "disabled"; };

...


I don't have any use for coresight at the moment,
but it seems somewhat odd to put this in the device specific dts.

Any idea what could be causing this crash?

This is mostly due to the missing power domain support. The CoreSight
components are usually in a debug power domain. So unless that is turned on,
(either by specifying proper power domain ids for power management protocol
supported by the firmware OR via other hacks - e.g, connecting a DS-5 to
keep the debug power domain turned on , this works on Juno -).

Interesting, thanks a lot!

In this case I'm wondering how it works on the Dragonboard 410c.

There can be two problems:

1) CPUidle is enabled on your platform and as I pointed out before,
that won't work.  There are patches circulating[1] to fix that problem
but it still needs a little bit of work.

I tried disabling cpuidle (see [1]), but unfortunately it did not help.

[1]: https://lore.kernel.org/linux-arm-msm/20190619173743.GA937@xxxxxxxxxxx/


2) As Suzuki pointed out the debug power domain may not be enabled by
default on your platform, something I would understand if it is a
production device.  There is nothing I can do on that front.

Indeed, this is a production device.
The downstream (production) kernel does not seem to have coresight
enabled, so it is very well possible that the debug power domain is not
enabled by the firmware.


[1]. https://www.spinics.net/lists/arm-kernel/msg735707.html

Does it enable these power domains in the firmware?
   (Assuming it boots without this error...)

The debug power domain is enabled by default on the 410c and the board
boots without error.

Good to know, thank you!



If coresight is not working properly on all/most msm8916 devices,
shouldn't coresight be disabled by default in msm8916.dtsi?

It is in the defconfig for arm64, as such it shouldn't bother you.

Indeed, I already have CONFIG_CORESIGHT disabled.
At the moment, I'm using arm64 defconfig as-is, with no modifications.

So the error happens in the AMBA bus code even when CONFIG_CORESIGHT is
disabled, as Suzuki suspected [2].

[2]: https://lore.kernel.org/linux-arm-msm/6bb74dcc-62e4-5310-5884-9c4b82ce5be9@xxxxxxx/


At least until those power domains can be set up by the kernel.

If this is a device-specific issue, what would be an acceptable solution
for mainline?
Can I turn on these power domains from the kernel?

Yes, if you have the SoC's TRM.

I guess "TRM" refers to Technical Reference Manual?
Unfortunately, I don't have access to any documentation that is not
publicly available on the Internet.


Or is it fine to disable coresight for this device with the snippet above?

I'm not actually trying to use coresight, I just want the device to boot :)
And since I am considering submitting my device tree for inclusion in
mainline, I want to ask in advance how I should tackle this problem.

Simply don't enable coresight in the kernel config if the code isn't
mature enough to properly handle the relevant power domains using the
PM runtime API.

The error occurs without CONFIG_CORESIGHT, and I believe there is no
way to disable CONFIG_AMBA (it is selected by CONFIG_ARM64 and included
in arm64 defconfig).

So, assuming it is the debug power domain, I believe I can make the
device boot successfully by either:

  (a) Turning on the debug power domain:
      It seems like the kernel cannot do this on msm8916 at the moment(?)
      (msm8916.dtsi does not declare any power domain in the coresight
       device tree nodes)

      I cannot modify the firmware of this device,
      so I'm afraid I have absolutely no idea how to turn it on. :/

  (b) Preventing the crash:
      Is there some way to:

       (1) Add a check in the AMBA bus code to verify if the power
           domain is actually turned on?

No, there isn't, unless the DT tells you that device is disabled, just like
your patch does.

      or
       (2) Recover from the "synchronous external abort" and continue
           booting after printing an error/warning?
           (At the moment, userspace seems to continue for a while,
            but stops working at some point after the error...)

Unfortunately, no. There is no way to do that from the kernel.


      Otherwise, there is still the option to prevent the AMBA bus code
      from running by disabling the affected device tree nodes.
      That's what the debug@850000 { status = "disabled"; }; ... snippet
      from my first mail [3] does, and it is the only way to make the
      kernel boot successfully at the moment.

For your board, I would say, this is the best option and the reasonable
solution.


      It wouldn't affect any other device if placed in the DTS for my
      device (i.e. *not* in the shared msm8916.dtsi).

Ultimately, the device tree is based on the assumption that you are running with
a firmware that supports the power domain and thus is fine for upstream. If
someone is using a firmware that doesn't support this, it is better to disable
the nodes, just like you did.

Personally I would leave the upstream DTS as it is and expect the user to
fixup his DTS for the firmware.

Kind regards
Suzuki



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux