Re: [PATCH V4] PCI: rcar: Add L1 link state fix into data abort hook

Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> · Mon, 14 Dec 2020 17:13:14 +0000

On Sat, Dec 12, 2020 at 08:12:16PM +0100, Marek Vasut wrote:
> On 12/10/20 1:12 PM, Lorenzo Pieralisi wrote:
> 
> [...]
> 
> > > > > > +static int __init rcar_pcie_init(void)
> > > > > > +{
> > > > > > +	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
> > > > > > +#ifdef CONFIG_ARM_LPAE
> > > > > > +		hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > > > > > +				"asynchronous external abort");
> > > > > > +#else
> > > > > > +		hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
> > > > > > +				"imprecise external abort");
> > > > > > +#endif
> > > > > > +	}
> > > > > > +
> > > > > > +	return platform_driver_register(&rcar_pcie_driver);
> > > > > > +}
> > > > > > +device_initcall(rcar_pcie_init);
> > > > > > +#else
> > > > > >    builtin_platform_driver(rcar_pcie_driver);
> > > > > > +#endif
> > > > > 
> > > > > Is the device_initcall() vs builtin_platform_driver() something
> > > > > related to the hook_fault_code()?  What would break if this were
> > > > > always builtin_platform_driver()?
> > > > 
> > > > rcar_pcie_init() would not be called before probe.
> > > 
> > > Sorry to be slow, but why does it need to be called before probe?
> > > Obviously software isn't putting the controller in D3 or enabling ASPM
> > > before probe.
> > 
> > I don't understand it either so it would be good to clarify.
> 
> The hook_fault_code() is marked __init, so if probe() was deferred and the
> kernel __init memory was free'd, attempt to call hook_fault_code() from
> probe would lead to a crash.

Understood - I don't think there is a point though in keeping
the builtin_platform_driver() call then, something like:

#ifdef CONFIG_ARM
...
static __init void init_platform_hook_fault(void) {
	if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) {
		#ifdef CONFIG_ARM_LPAE
			hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
					"asynchronous external abort");
		#else
			hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0,
					"imprecise external abort");
		#endif
	}
}
#else
static inline void init_platform_hook_fault(void)
{}
#endif

static int __init rcar_pcie_init(void)
{
	init_platform_hook_fault();
	return platform_driver_register(&rcar_pcie_driver);
}
device_initcall(rcar_pcie_init);

Or we remove the __init marker from hook_fault_code().

> > Also, some of these platforms are SMP systems, I don't understand
> > what prevents multiple cores to fault at once given that the faults
> > can happen for config/io/mem accesses alike.
> > 
> > I understand that the immediate fix is for S2R, that is single
> > threaded but I would like to understand how comprehensive this fix
> > is.
> 
> Are you suggesting to add some sort of locking ?

If we merge a fix the fix has to work, by reading the code if multiple
cores fault at once this fix seems to have an issue that's why I asked,
you may still end up with an unhandled fault by reading the code.

Lorenzo