Re: [PATCH V4] PCI: rcar: Add L1 link state fix into data abort hook

Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> · Thu, 10 Dec 2020 12:12:50 +0000

On Tue, Dec 08, 2020 at 12:46:27PM -0600, Bjorn Helgaas wrote:
> On Tue, Dec 08, 2020 at 07:05:09PM +0100, Marek Vasut wrote:
> > On 12/8/20 5:40 PM, Bjorn Helgaas wrote:
> 
> > > > +static const struct of_device_id rcar_pcie_abort_handler_of_match[] __initconst = {
> > > > +	{ .compatible = "renesas,pcie-r8a7779" },
> > > > +	{ .compatible = "renesas,pcie-r8a7790" },
> > > > +	{ .compatible = "renesas,pcie-r8a7791" },
> > > > +	{ .compatible = "renesas,pcie-rcar-gen2" },
> > > > +	{},
> > > > +};
> > > 
> > > Why do we need another copy of these, as opposed to doing something
> > > with of_device_get_match_data(), e.g., like brcm_pcie_probe() does?
> > 
> > This is not a copy, but as subset of SoCs which are affected by this
> > problem.
> 
> I know it's not a complete copy.  Many systems include flags like
> "broken_l1" in their match_data.  Something like this:
> 
>   struct rcar_pcie_drvdata {
>     int            (*phy_init_fn)(struct rcar_pcie_host *host);
>     unsigned int   broken_l1:1;
>   };
> 
>   static const struct rcar_pcie_drvdata rcar_init_h1_drvdata = {
>     .phy_init_fn = rcar_pcie_phy_init_h1,
>     .broken_l1 = 1,
>   };
> 
>   static const struct rcar_pcie_drvdata rcar_init_gen2_drvdata = {
>     .phy_init_fn = rcar_pcie_phy_init_gen2,
>     .broken_l1 = 1,
>   };
> 
>   static const struct rcar_pcie_drvdata rcar_init_gen3_drvdata = {
>     .phy_init_fn = rcar_pcie_phy_init_gen3,
>   };
> 
>   static const struct of_device_id rcar_pcie_of_match[] = {
>     { .compatible = "renesas,pcie-r8a7779", .data = rcar_init_h1_drvdata },
>     { .compatible = "renesas,pcie-r8a7790", .data = rcar_init_gen2_drvdata },
>     { .compatible = "renesas,pcie-r8a7791", .data = rcar_init_gen2_drvdata },
>     ...

+1

> > > > +static int __init rcar_pcie_init(void)
> > > > +{
> > > > +	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
> > > > +#ifdef CONFIG_ARM_LPAE
> > > > +		hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > > > +				"asynchronous external abort");
> > > > +#else
> > > > +		hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > > > +				"imprecise external abort");
> > > > +#endif
> > > > +	}
> > > > +
> > > > +	return platform_driver_register(&rcar_pcie_driver);
> > > > +}
> > > > +device_initcall(rcar_pcie_init);
> > > > +#else
> > > >   builtin_platform_driver(rcar_pcie_driver);
> > > > +#endif
> > > 
> > > Is the device_initcall() vs builtin_platform_driver() something
> > > related to the hook_fault_code()?  What would break if this were
> > > always builtin_platform_driver()?
> > 
> > rcar_pcie_init() would not be called before probe.
> 
> Sorry to be slow, but why does it need to be called before probe?
> Obviously software isn't putting the controller in D3 or enabling ASPM
> before probe.

I don't understand it either so it would be good to clarify.

Also, some of these platforms are SMP systems, I don't understand
what prevents multiple cores to fault at once given that the faults
can happen for config/io/mem accesses alike.

I understand that the immediate fix is for S2R, that is single
threaded but I would like to understand how comprehensive this fix
is.

Thanks,
Lorenzo