On Wed, May 19, 2021 at 9:04 AM Robin Murphy <robin.murphy@xxxxxxx> wrote: > > [ +linux-pci for visibility ] > > On 2021-05-18 10:09, Alexandru Elisei wrote: > > After doing a git bisect I was able to trace the following error when booting my > > rockpro64 v2 (rk3399 SoC) with a PCIE NVME expansion card: > > > > [..] > > [ 0.305183] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges: > > [ 0.305248] rockchip-pcie f8000000.pcie: MEM 0x00fa000000..0x00fbdfffff -> > > 0x00fa000000 > > [ 0.305285] rockchip-pcie f8000000.pcie: IO 0x00fbe00000..0x00fbefffff -> > > 0x00fbe00000 > > [ 0.306201] rockchip-pcie f8000000.pcie: supply vpcie1v8 not found, using dummy > > regulator > > [ 0.306334] rockchip-pcie f8000000.pcie: supply vpcie0v9 not found, using dummy > > regulator > > [ 0.373705] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00 > > [ 0.373730] pci_bus 0000:00: root bus resource [bus 00-1f] > > [ 0.373751] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff 64bit] > > [ 0.373777] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff] (bus > > address [0xfbe00000-0xfbefffff]) > > [ 0.373839] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400 > > [ 0.373973] pci 0000:00:00.0: supports D1 > > [ 0.373992] pci 0000:00:00.0: PME# supported from D0 D1 D3hot > > [ 0.378518] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), > > reconfiguring > > [ 0.378765] pci 0000:01:00.0: [144d:a808] type 00 class 0x010802 > > [ 0.378869] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit] > > [ 0.379051] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256) > > [ 0.379661] pci 0000:01:00.0: 8.000 Gb/s available PCIe bandwidth, limited by > > 2.5 GT/s PCIe x4 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe > > x4 link) > > [ 0.393269] pci_bus 0000:01: busn_res: [bus 01-1f] end is updated to 01 > > [ 0.393311] pci 0000:00:00.0: BAR 14: no space for [mem size 0x00100000] > > [ 0.393333] pci 0000:00:00.0: BAR 14: failed to assign [mem size 0x00100000] > > [ 0.393356] pci 0000:01:00.0: BAR 0: no space for [mem size 0x00004000 64bit] > > [ 0.393375] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x00004000 64bit] > > [ 0.393397] pci 0000:00:00.0: PCI bridge to [bus 01] > > [ 0.393839] pcieport 0000:00:00.0: PME: Signaling with IRQ 78 > > [ 0.394165] pcieport 0000:00:00.0: AER: enabled with IRQ 78 > > [..] > > > > to the commit 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource flags for > > 64-bit memory addresses"). > > FWFW, my hunch is that the host bridge advertising no 32-bit memory > resource, only only a single 64-bit non-prefetchable one (even though > it's entirely below 4GB) might be a bit weird and tripping something up > in the resource assignment code. It certainly seems like the thing most > directly related to the offending commit. > > I'd be tempted to try fiddling with that in the DT (i.e. changing > 0x83000000 to 0x82000000 in the PCIe node's "ranges" property) to see if > it makes any difference. Note that even if it helps, though, I don't > know whether that's the correct fix or just a bodge around a corner-case > bug somewhere in the resource code. > > Robin. Good Morning Robin, It seems we meet again for PCIe issues. I think you might be onto something about the resource assignment code doing weird things. I'm doing early bringup on the rk3566, which has a 1GB address space at 0x3 0x00000000 for the PCIe controller. I started with the recent linux-next, so this patch was already applied. Since it has a large enough address space, I decided to try and get a DGPU to work with it. I kept hitting strange issues such as it wouldn't allocate 32bit BARs in the 64bit space randomly. I tried messing with the ranges to force it as 32bit, but it would still be flagged as a 64bit space when finally allocated (I thought this might be due to the location of the memory). Here are the ranges that I eventually got to allocate correctly (once). ranges = <0x01000000 0x0 0x00800000 0x3 0x00800000 0x0 0x00100000 0x02000000 0x0 0x00900000 0x3 0x00900000 0x0 0x30000000 0x43000000 0x0 0x30900000 0x3 0x30900000 0x0 0x0f700000>; It did weird things when I'd change that 0x02000000 to a 0x03000000, even though the final allocation was flagged as 64bit. Thanks for everything! Peter > > > For reference, here is the dmesg output when BAR > > reassignment works: > > > > [..] > > [ 0.307381] rockchip-pcie f8000000.pcie: host bridge /pcie@f8000000 ranges: > > [ 0.307445] rockchip-pcie f8000000.pcie: MEM 0x00fa000000..0x00fbdfffff -> > > 0x00fa000000 > > [ 0.307481] rockchip-pcie f8000000.pcie: IO 0x00fbe00000..0x00fbefffff -> > > 0x00fbe00000 > > [ 0.308406] rockchip-pcie f8000000.pcie: supply vpcie1v8 not found, using dummy > > regulator > > [ 0.308534] rockchip-pcie f8000000.pcie: supply vpcie0v9 not found, using dummy > > regulator > > [ 0.374676] rockchip-pcie f8000000.pcie: PCI host bridge to bus 0000:00 > > [ 0.374701] pci_bus 0000:00: root bus resource [bus 00-1f] > > [ 0.374723] pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff] > > [ 0.374746] pci_bus 0000:00: root bus resource [io 0x0000-0xfffff] (bus > > address [0xfbe00000-0xfbefffff]) > > [ 0.374808] pci 0000:00:00.0: [1d87:0100] type 01 class 0x060400 > > [ 0.374943] pci 0000:00:00.0: supports D1 > > [ 0.374961] pci 0000:00:00.0: PME# supported from D0 D1 D3hot > > [ 0.379473] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), > > reconfiguring > > [ 0.379712] pci 0000:01:00.0: [144d:a808] type 00 class 0x010802 > > [ 0.379815] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit] > > [ 0.379997] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max 256) > > [ 0.380607] pci 0000:01:00.0: 8.000 Gb/s available PCIe bandwidth, limited by > > 2.5 GT/s PCIe x4 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe > > x4 link) > > [ 0.394239] pci_bus 0000:01: busn_res: [bus 01-1f] end is updated to 01 > > [ 0.394285] pci 0000:00:00.0: BAR 14: assigned [mem 0xfa000000-0xfa0fffff] > > [ 0.394312] pci 0000:01:00.0: BAR 0: assigned [mem 0xfa000000-0xfa003fff 64bit] > > [ 0.394374] pci 0000:00:00.0: PCI bridge to [bus 01] > > [ 0.394395] pci 0000:00:00.0: bridge window [mem 0xfa000000-0xfa0fffff] > > [ 0.394569] pcieport 0000:00:00.0: enabling device (0000 -> 0002) > > [ 0.394845] pcieport 0000:00:00.0: PME: Signaling with IRQ 78 > > [ 0.395153] pcieport 0000:00:00.0: AER: enabled with IRQ 78 > > [..] > > > > And here is the output of lspci when BAR reassignment works: > > > > # lspci -v > > 00:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd RK3399 PCI Express Root > > Port (prog-if 00 [Normal decode]) > > Flags: bus master, fast devsel, latency 0, IRQ 78 > > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 > > I/O behind bridge: 00000000-00000fff [size=4K] > > Memory behind bridge: fa000000-fa0fffff [size=1M] > > Prefetchable memory behind bridge: 00000000-000fffff [size=1M] > > Capabilities: [80] Power Management version 3 > > Capabilities: [90] MSI: Enable+ Count=1/1 Maskable+ 64bit+ > > Capabilities: [b0] MSI-X: Enable- Count=1 Masked- > > Capabilities: [c0] Express Root Port (Slot+), MSI 00 > > Capabilities: [100] Advanced Error Reporting > > Capabilities: [274] Transaction Processing Hints > > Kernel driver in use: pcieport > > lspci: Unable to load libkmod resources: error -2 > > > > 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD > > Controller SM981/PM981/PM983 (prog-if 02 [NVM Express]) > > Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 > > Flags: bus master, fast devsel, latency 0, IRQ 77, NUMA node 0 > > Memory at fa000000 (64-bit, non-prefetchable) [size=16K] > > Capabilities: [40] Power Management version 3 > > Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ > > Capabilities: [70] Express Endpoint, MSI 00 > > Capabilities: [b0] MSI-X: Enable+ Count=33 Masked- > > Capabilities: [100] Advanced Error Reporting > > Capabilities: [148] Device Serial Number 00-00-00-00-00-00-00-00 > > Capabilities: [158] Power Budgeting <?> > > Capabilities: [168] Secondary PCI Express > > Capabilities: [188] Latency Tolerance Reporting > > Capabilities: [190] L1 PM Substates > > Kernel driver in use: nvme > > > > I can provide more information if needed (the board is sitting on my desk) and I > > can help with testing the fix. > > > > Thanks, > > > > Alex > > > > > > _______________________________________________ > > Linux-rockchip mailing list > > Linux-rockchip@xxxxxxxxxxxxxxxxxxx > > http://lists.infradead.org/mailman/listinfo/linux-rockchip > > > > _______________________________________________ > Linux-rockchip mailing list > Linux-rockchip@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/linux-rockchip