On Sat, Dec 12, 2020 at 08:12:16PM +0100, Marek Vasut wrote: > On 12/10/20 1:12 PM, Lorenzo Pieralisi wrote: > > [...] > > > > > > > +static int __init rcar_pcie_init(void) > > > > > > +{ > > > > > > + if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) { > > > > > > +#ifdef CONFIG_ARM_LPAE > > > > > > + hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0, > > > > > > + "asynchronous external abort"); > > > > > > +#else > > > > > > + hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0, > > > > > > + "imprecise external abort"); > > > > > > +#endif > > > > > > + } > > > > > > + > > > > > > + return platform_driver_register(&rcar_pcie_driver); > > > > > > +} > > > > > > +device_initcall(rcar_pcie_init); > > > > > > +#else > > > > > > builtin_platform_driver(rcar_pcie_driver); > > > > > > +#endif > > > > > > > > > > Is the device_initcall() vs builtin_platform_driver() something > > > > > related to the hook_fault_code()? What would break if this were > > > > > always builtin_platform_driver()? > > > > > > > > rcar_pcie_init() would not be called before probe. > > > > > > Sorry to be slow, but why does it need to be called before probe? > > > Obviously software isn't putting the controller in D3 or enabling ASPM > > > before probe. > > > > I don't understand it either so it would be good to clarify. > > The hook_fault_code() is marked __init, so if probe() was deferred and the > kernel __init memory was free'd, attempt to call hook_fault_code() from > probe would lead to a crash. Understood - I don't think there is a point though in keeping the builtin_platform_driver() call then, something like: #ifdef CONFIG_ARM ... static __init void init_platform_hook_fault(void) { if (of_find_matching_node(NULL, rcar_pcie_abort_handler_of_match)) { #ifdef CONFIG_ARM_LPAE hook_fault_code(17, rcar_pcie_aarch32_abort_handler, SIGBUS, 0, "asynchronous external abort"); #else hook_fault_code(22, rcar_pcie_aarch32_abort_handler, SIGBUS, 0, "imprecise external abort"); #endif } } #else static inline void init_platform_hook_fault(void) {} #endif static int __init rcar_pcie_init(void) { init_platform_hook_fault(); return platform_driver_register(&rcar_pcie_driver); } device_initcall(rcar_pcie_init); Or we remove the __init marker from hook_fault_code(). > > Also, some of these platforms are SMP systems, I don't understand > > what prevents multiple cores to fault at once given that the faults > > can happen for config/io/mem accesses alike. > > > > I understand that the immediate fix is for S2R, that is single > > threaded but I would like to understand how comprehensive this fix > > is. > > Are you suggesting to add some sort of locking ? If we merge a fix the fix has to work, by reading the code if multiple cores fault at once this fix seems to have an issue that's why I asked, you may still end up with an unhandled fault by reading the code. Lorenzo