Hi Jonathan, Sam, Am Mittwoch, 5. Juni 2024, 21:45:42 CEST schrieb Jonathan Bennett: > On 12/8/23 11:27 AM, Sam Edwards wrote: > > On 12/8/23 04:05, Heiko Stübner wrote: > >> Am Freitag, 8. Dezember 2023, 07:25:10 CET schrieb Sam Edwards: > >>> The RK3588 PCIe 3.0 controller seems to have unpredictable behavior > >>> when > >>> no CLKREQ/PERST/WAKE pins are configured in the pinmux. In > >>> particular, it > >>> will sometimes (varying between specific RK3588 chips, not over > >>> time) shut > >>> off the DBI block, and reads to this range will instead stall > >>> indefinitely. > >>> > >>> When this happens, it will prevent Linux from booting altogether. The > >>> PCIe driver will stall the CPU core once it attempts to read the > >>> version > >>> information from the DBI range. > >>> > >>> Fix this boot hang by adding the correct pinctrl configuration to the > >>> PCIe 3.0 device node, which is the proper thing to do anyway. While > >>> we're at it, also add the necessary configuration to the PCIe 2.0 node, > >>> which may or may not fix the equivalent problem over there -- but is > >>> the > >>> proper thing to do anyway. :) > >>> > >>> Fixes: 2806a69f3fef6 ("arm64: dts: rockchip: Add Turing RK1 SoM > >>> support") > >>> Signed-off-by: Sam Edwards <CFSworks@xxxxxxxxx> > >>> --- > >>> > >>> Hi list, > >>> > >>> Compared to v1, v2 removes the `reset-gpios` properties as well -- > >>> this should > >>> give control of the PCIe resets exclusively to the PCIe cores. (And > >>> even if the > >>> `reset-gpios` props had no effect in v1, it'd be confusing to have > >>> them there.) > >> > >> Hmm, I'd think this could result in differing behaviour. > >> > >> I.e. I tried the same on a different board with a nvme drive on the > >> pci30x4 > >> controller. But moving the reset from the gpio-way to "just" setting the > >> perstn pinctrl, simply hung the controller when probing the device. > > > > Ah, I'm guessing it died in: > > ver = dw_pcie_readl_dbi(pci, PCIE_VERSION_NUMBER); > > > > If so, that's the same hang that this patch is aiming to solve. I'm > > starting to wonder if having muxed pins != 1 for a given signal upsets > > the RC(s). Is your board (in an earlier boot stage: > > reset/maskrom/bootloader) muxing a different perstn pin option to that > > controller (and Linux doesn't know to clear it away)? Then when you > > add the perstn pinctrl the total number of perstns muxed to the > > controller comes to 2, and without a reset-gpios = <...>; to take it > > away again, that number stays at 2 to cause the hang upon the DBI read? > > > > If so, that'd mean the change resolves the hang for me, because it > > brings the total up to 1 (from 0), but also causes the hang for you, > > because it brings the total up to 2 (away from 1). > > > >> > >> So I guess I'd think the best way would be to split the pinctrl up > >> into the > >> 3 separate functions (clkreqn, perstn, waken) so that boards can include > >> them individually. > > > > Mm, maybe. Though that might be more appropriate if a board comes > > along that doesn't connect all of those signals to the same group of > > pins. I worry that attempting to solve this hang by doing that will > > instead just mask the real problem. > > > >> > >> Nobody is using the controller pinctrl entries so far anyway :-) . > > > > Then it's interesting that this board requires them in order to avoid > > a hang on boot. I'll see if there's anything else that I can find out. > > I've just finished testing the latest iteration of this patch with > 6.10-rc2 on my RK1 on a Turing Pi 2 carrier board. I can confirm that > unpatched mainline fails to boot with the same hang described here, and > the patch does make the board boot again. Can you possibly test if https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=28b8d7793b8573563b3d45321376f36168d77b1e changes anything? In 6.11-rc1 now. The PERST# toggling happening before that patch could've caused issues with your PCIe device. Heiko