Re: Coresight causes synchronous external abort on msm8916

Mathieu Poirier <mathieu.poirier@xxxxxxxxxx> · Thu, 20 Jun 2019 09:00:50 -0600

On Thu, 20 Jun 2019 at 03:06, Suzuki K Poulose <suzuki.poulose@xxxxxxx> wrote:
>
>
>
> On 20/06/2019 07:29, Sai Prakash Ranjan wrote:
> > Hi Stephan,
> >
> > On 6/20/2019 12:09 AM, Stephan Gerhold wrote:
> >> Hi,
> >>
> >> On Wed, Jun 19, 2019 at 09:49:03AM +0100, Suzuki K Poulose wrote:
> >>> Hi Stephan,
> >>>
> >>> On 18/06/2019 21:26, Stephan Gerhold wrote:
> >>>> Hi,
> >>>>
> >>>> I'm trying to run mainline Linux on a smartphone with MSM8916 SoC.
> >>>> It works surprisingly well, but the coresight devices seem to cause the
> >>>> following crash shortly after userspace starts:
> >>>>
> >>>>        Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP
> >>>
> >>> ...
> >>>
> >>>
> >>>>
> >>>> In this case I'm using a simple device tree similar to apq8016-sbc,
> >>>> but it also happens using something as simple as msm8916-mtp.dts
> >>>> on this particular device.
> >>>>      (Attached: dmesg log with msm8916-mtp.dts and arm64 defconfig)
> >>>>
> >>>> I can avoid the crash and boot without any further problems by disabling
> >>>> every coresight device defined in msm8916.dtsi, e.g.:
> >>>>
> >>>>    tpiu@820000 { status = "disabled"; };
> >>>
> >>> ...
> >>>
> >>>>
> >>>> I don't have any use for coresight at the moment,
> >>>> but it seems somewhat odd to put this in the device specific dts.
> >>>>
> >>>> Any idea what could be causing this crash?
> >>>
> >>> This is mostly due to the missing power domain support. The CoreSight
> >>> components are usually in a debug power domain. So unless that is turned on,
> >>> (either by specifying proper power domain ids for power management protocol
> >>> supported by the firmware OR via other hacks - e.g, connecting a DS-5 to
> >>> keep the debug power domain turned on , this works on Juno -).
> >>
> >> Interesting, thanks a lot!
> >>
> >> In this case I'm wondering how it works on the Dragonboard 410c.
> >> Does it enable these power domains in the firmware?
> >>     (Assuming it boots without this error...)
> >>
> >> If coresight is not working properly on all/most msm8916 devices,
> >> shouldn't coresight be disabled by default in msm8916.dtsi?
> >> At least until those power domains can be set up by the kernel.
> >>
> >> If this is a device-specific issue, what would be an acceptable solution
> >> for mainline?
> >> Can I turn on these power domains from the kernel?
> >> Or is it fine to disable coresight for this device with the snippet above?
> >>
> >> I'm not actually trying to use coresight, I just want the device to boot :)
> >> And since I am considering submitting my device tree for inclusion in
> >> mainline, I want to ask in advance how I should tackle this problem.
> >>
> >> Thanks!
> >> Stephan
> >>
> >
> > This doesn't seem like cpuidle or debug power domain issue, but looks
>
> We are not yet there in the Coresight driver and we crash at AMBA bus layer
> trying to read the PID of the CoreSight device. So I doubt if this is an
> issue your patch trying to address. I still think this is a debug power domain
> issue. More your patch below.
>
> > like cpu affinity issue. Can you please try out this patch and let us
> > know?
>
> In general I am for the patch, breaking the "assumption" that a missing CPU
> phandle gives you the affinity of "0".
>
> >
> > diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c
> > b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> > index e8819d750938..9acf9f190d42 100644
> > --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c
> > +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c
> > @@ -579,7 +579,11 @@ static int debug_probe(struct amba_device *adev,
> > const struct amba_id *id)
> >       if (!drvdata)
> >               return -ENOMEM;
> >
> > -     drvdata->cpu = np ? of_coresight_get_cpu(np) : 0;
> > +     drvdata->cpu = np ? of_coresight_get_cpu(np) : -ENODEV;
>
>
> of_coresight_get_cpu() must be modified to return -ENODEV, rather than
> defaulting to 0. This is something that is required by the CTI driver too.
> And lets not bring up something and assume it belongs to CPU0.
>
> > +     if (drvdata->cpu == -ENODEV) {
> > +             return -ENODEV;
> > +     }
> > +
> >       if (per_cpu(debug_drvdata, drvdata->cpu)) {
> >               dev_err(dev, "CPU%d drvdata has already been initialized\n",
> >                       drvdata->cpu);
> > diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c
> > b/drivers/hwtracing/coresight/coresight-etm4x.c
> > index 8bb0092c7ec2..660432acbac0 100644
> > --- a/drivers/hwtracing/coresight/coresight-etm4x.c
> > +++ b/drivers/hwtracing/coresight/coresight-etm4x.c
> > @@ -1107,7 +1107,10 @@ static int etm4_probe(struct amba_device *adev,
> > const struct amba_id *id)
> >
> >       spin_lock_init(&drvdata->spinlock);
> >
> > -     drvdata->cpu = pdata ? pdata->cpu : 0;
>
> I believe, we should simply abort when we don't have pdata. There is no point
> in registering this ETM unless we know where this is connected to.
>
> > +     drvdata->cpu = pdata ? pdata->cpu : -ENODEV;
> > +     if (drvdata->cpu == -ENODEV) {
> > +             return -ENODEV;
> > +       }
>
> >
> >       cpus_read_lock();
> >       etmdrvdata[drvdata->cpu] = drvdata;
> > diff --git a/drivers/hwtracing/coresight/of_coresight.c
> > b/drivers/hwtracing/coresight/of_coresight.c
> > index 7045930fc958..8c1b90ba233c 100644
> > --- a/drivers/hwtracing/coresight/of_coresight.c
> > +++ b/drivers/hwtracing/coresight/of_coresight.c
> > @@ -153,14 +153,14 @@ int of_coresight_get_cpu(const struct device_node
> > *node)
> >       struct device_node *dn;
> >
> >       dn = of_parse_phandle(node, "cpu", 0);
> > -     /* Affinity defaults to CPU0 */
> > +     /* Affinity defaults to invalid */
> >       if (!dn)
> > -             return 0;
> > +             return -ENODEV;
> >       cpu = of_cpu_node_to_id(dn);
> >       of_node_put(dn);
> >
> > -     /* Affinity to CPU0 if no cpu nodes are found */
> > -     return (cpu < 0) ? 0 : cpu;
> > +     /* Affinity to invalid if no cpu nodes are found */
> > +     return (cpu < 0) ? -ENODEV : cpu;
>
>         return cpu ?
>
> If you split this into 3 different patches, I would be happy to Ack them.
>
> Mathieu,
>
> What do you think ?

I'm all for it.  Defaulting to '0' was valid in an era that is long
gone and needs to be fixed.

>
>
> Cheers
> Suzuki