On Thu, 20 Jun 2019 at 03:06, Suzuki K Poulose <suzuki.poulose@xxxxxxx> wrote: > > > > On 20/06/2019 07:29, Sai Prakash Ranjan wrote: > > Hi Stephan, > > > > On 6/20/2019 12:09 AM, Stephan Gerhold wrote: > >> Hi, > >> > >> On Wed, Jun 19, 2019 at 09:49:03AM +0100, Suzuki K Poulose wrote: > >>> Hi Stephan, > >>> > >>> On 18/06/2019 21:26, Stephan Gerhold wrote: > >>>> Hi, > >>>> > >>>> I'm trying to run mainline Linux on a smartphone with MSM8916 SoC. > >>>> It works surprisingly well, but the coresight devices seem to cause the > >>>> following crash shortly after userspace starts: > >>>> > >>>> Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP > >>> > >>> ... > >>> > >>> > >>>> > >>>> In this case I'm using a simple device tree similar to apq8016-sbc, > >>>> but it also happens using something as simple as msm8916-mtp.dts > >>>> on this particular device. > >>>> (Attached: dmesg log with msm8916-mtp.dts and arm64 defconfig) > >>>> > >>>> I can avoid the crash and boot without any further problems by disabling > >>>> every coresight device defined in msm8916.dtsi, e.g.: > >>>> > >>>> tpiu@820000 { status = "disabled"; }; > >>> > >>> ... > >>> > >>>> > >>>> I don't have any use for coresight at the moment, > >>>> but it seems somewhat odd to put this in the device specific dts. > >>>> > >>>> Any idea what could be causing this crash? > >>> > >>> This is mostly due to the missing power domain support. The CoreSight > >>> components are usually in a debug power domain. So unless that is turned on, > >>> (either by specifying proper power domain ids for power management protocol > >>> supported by the firmware OR via other hacks - e.g, connecting a DS-5 to > >>> keep the debug power domain turned on , this works on Juno -). > >> > >> Interesting, thanks a lot! > >> > >> In this case I'm wondering how it works on the Dragonboard 410c. > >> Does it enable these power domains in the firmware? > >> (Assuming it boots without this error...) > >> > >> If coresight is not working properly on all/most msm8916 devices, > >> shouldn't coresight be disabled by default in msm8916.dtsi? > >> At least until those power domains can be set up by the kernel. > >> > >> If this is a device-specific issue, what would be an acceptable solution > >> for mainline? > >> Can I turn on these power domains from the kernel? > >> Or is it fine to disable coresight for this device with the snippet above? > >> > >> I'm not actually trying to use coresight, I just want the device to boot :) > >> And since I am considering submitting my device tree for inclusion in > >> mainline, I want to ask in advance how I should tackle this problem. > >> > >> Thanks! > >> Stephan > >> > > > > This doesn't seem like cpuidle or debug power domain issue, but looks > > We are not yet there in the Coresight driver and we crash at AMBA bus layer > trying to read the PID of the CoreSight device. So I doubt if this is an > issue your patch trying to address. I still think this is a debug power domain > issue. More your patch below. > > > like cpu affinity issue. Can you please try out this patch and let us > > know? > > In general I am for the patch, breaking the "assumption" that a missing CPU > phandle gives you the affinity of "0". > > > > > diff --git a/drivers/hwtracing/coresight/coresight-cpu-debug.c > > b/drivers/hwtracing/coresight/coresight-cpu-debug.c > > index e8819d750938..9acf9f190d42 100644 > > --- a/drivers/hwtracing/coresight/coresight-cpu-debug.c > > +++ b/drivers/hwtracing/coresight/coresight-cpu-debug.c > > @@ -579,7 +579,11 @@ static int debug_probe(struct amba_device *adev, > > const struct amba_id *id) > > if (!drvdata) > > return -ENOMEM; > > > > - drvdata->cpu = np ? of_coresight_get_cpu(np) : 0; > > + drvdata->cpu = np ? of_coresight_get_cpu(np) : -ENODEV; > > > of_coresight_get_cpu() must be modified to return -ENODEV, rather than > defaulting to 0. This is something that is required by the CTI driver too. > And lets not bring up something and assume it belongs to CPU0. > > > + if (drvdata->cpu == -ENODEV) { > > + return -ENODEV; > > + } > > + > > if (per_cpu(debug_drvdata, drvdata->cpu)) { > > dev_err(dev, "CPU%d drvdata has already been initialized\n", > > drvdata->cpu); > > diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c > > b/drivers/hwtracing/coresight/coresight-etm4x.c > > index 8bb0092c7ec2..660432acbac0 100644 > > --- a/drivers/hwtracing/coresight/coresight-etm4x.c > > +++ b/drivers/hwtracing/coresight/coresight-etm4x.c > > @@ -1107,7 +1107,10 @@ static int etm4_probe(struct amba_device *adev, > > const struct amba_id *id) > > > > spin_lock_init(&drvdata->spinlock); > > > > - drvdata->cpu = pdata ? pdata->cpu : 0; > > I believe, we should simply abort when we don't have pdata. There is no point > in registering this ETM unless we know where this is connected to. > > > + drvdata->cpu = pdata ? pdata->cpu : -ENODEV; > > + if (drvdata->cpu == -ENODEV) { > > + return -ENODEV; > > + } > > > > > cpus_read_lock(); > > etmdrvdata[drvdata->cpu] = drvdata; > > diff --git a/drivers/hwtracing/coresight/of_coresight.c > > b/drivers/hwtracing/coresight/of_coresight.c > > index 7045930fc958..8c1b90ba233c 100644 > > --- a/drivers/hwtracing/coresight/of_coresight.c > > +++ b/drivers/hwtracing/coresight/of_coresight.c > > @@ -153,14 +153,14 @@ int of_coresight_get_cpu(const struct device_node > > *node) > > struct device_node *dn; > > > > dn = of_parse_phandle(node, "cpu", 0); > > - /* Affinity defaults to CPU0 */ > > + /* Affinity defaults to invalid */ > > if (!dn) > > - return 0; > > + return -ENODEV; > > cpu = of_cpu_node_to_id(dn); > > of_node_put(dn); > > > > - /* Affinity to CPU0 if no cpu nodes are found */ > > - return (cpu < 0) ? 0 : cpu; > > + /* Affinity to invalid if no cpu nodes are found */ > > + return (cpu < 0) ? -ENODEV : cpu; > > return cpu ? > > If you split this into 3 different patches, I would be happy to Ack them. > > Mathieu, > > What do you think ? I'm all for it. Defaulting to '0' was valid in an era that is long gone and needs to be fixed. > > > Cheers > Suzuki