Re: [Bug 216795] New: PCI resource allocation mismatch with BIOS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 09, 2022 at 11:03:06AM +0000, bugzilla-daemon@xxxxxxxxxx wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216795
> 
>             Bug ID: 216795
>            Summary: PCI resource allocation mismatch with BIOS
>     Kernel Version: v6.1-rc8
>           Reporter: mika.westerberg@xxxxxxxxxxxxxxx
> 
> Created attachment 303384
>   --> https://bugzilla.kernel.org/attachment.cgi?id=303384&action=edit
> Dmesg from the system
> 
> The device in question is a GPU with an integrated PCIe switch connected
> to a root port of a system:
> 
> 0000:50:02.0 Root Port
>   0000:51:00.0 Switch Upstream Port
>   0000:52:01.0 Switch Downstream Port
>    0000:53:00.0 GPU Endpoint
> 
> The GPU has SRIOV capability and the BIOS allocates resources for these
> (see the attached dumps). However, if parts of the topology is removed
> through sysfs and then re-scanned the resource allocation fails and that
> leaves the GPU without any resources assigned.
> 
> The real use-case is in data centers if the GPU hangs to reset it
> through Secondary Bus Reset. This avoids rebooting the whole system. The
> below steps are the minimal to get it reproduced in the current
> Linux mainline (v6.1-rc8).
> 
> The expectation is that the rescan results similar resource allocation
> than what was done by the BIOS. What happens though is that the Linux
> resource allocation seems to allocate "bigger" windows that then does
> not fit into the BIOS allocated resources above the Downststream Port.
> 
> Steps
> -----
> 1. Boot the system up
> 2. Take lspci and iomem dumps
> 
> # lspci -vv > lspci.before
> # cp /proc/iomem iomem.before
> 
> 3. Remove the Switch Downstream Port and the GPU Endpoint
> 
> # echo 1 > /sys/bus/pci/devices/0000:50:02.0/0000:51:00.0/0000:52:01.0/remove
> 
> 4. Rescan from the Switch Upstream Port
> 
> # echo 1 > /sys/bus/pci/devices/0000:50:02.0/0000:51:00.0/rescan
> 
> 5. Take the dumps
> 
> # lspci -vv > lspci.after
> # cp /proc/iomem iomem.after
> 
> BIOS assigned resources (lspci.before)
> --------------------------------------
> 52:01.0 PCI bridge: Intel Corporation Device 4fa4 (prog-if 00 [Normal decode])
>         ...
>         Bus: primary=52, secondary=53, subordinate=54, sec-latency=0
>         I/O behind bridge: [disabled]
>         Memory behind bridge: bb800000-bb9fffff [size=2M]
>         Prefetchable memory behind bridge: 0000201c00000000-0000205e1fffffff
> [size=270848M]
> 
> 53:00.0 Display controller: Intel Corporation Device 56c0 (rev 08)
>         ...
>         Region 0: Memory at 205e1f000000 (64-bit, prefetchable) [size=16M]
>         Region 2: Memory at 201c00000000 (64-bit, prefetchable) [size=16G]
>         Expansion ROM at bb800000 [disabled] [size=2M]
>         ...
>         Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
>                 IOVCap: Migration-, Interrupt Message Number: 000
>                 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
>                 IOVSta: Migration-
>                 Initial VFs: 31, Total VFs: 31, Number of VFs: 0, Function
> Dependency Link: 00
>                 VF offset: 1, stride: 1, Device ID: 56c0
>                 Supported Page Size: 00000553, System Page Size: 00000001
>                 Region 0: Memory at 0000205e00000000 (64-bit, prefetchable)
>                 Region 2: Memory at 0000202000000000 (64-bit, prefetchable)
>                 VF Migration: offset: 00000000, BIR: 0
> 
> Linux assigned resources (lspci.after)
> --------------------------------------
> 52:01.0 PCI bridge: Intel Corporation Device 4fa4 (prog-if 00 [Normal decode])
>         ...
>         Bus: primary=52, secondary=53, subordinate=54, sec-latency=0
>         I/O behind bridge: [disabled]
>         Memory behind bridge: bb800000-bb9fffff [size=2M]
>         Prefetchable memory behind bridge: [disabled]
> 
> 53:00.0 Display controller: Intel Corporation Device 56c0 (rev 08)
>         ...
>         Region 0: Memory at <ignored> (64-bit, prefetchable)
>         Region 2: Memory at <ignored> (64-bit, prefetchable)
>         ...
>         Capabilities: [320 v1] Single Root I/O Virtualization (SR-IOV)
>                 IOVCap: Migration-, Interrupt Message Number: 000
>                 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
>                 IOVSta: Migration-
>                 Initial VFs: 31, Total VFs: 31, Number of VFs: 0, Function
> Dependency Link: 00
>                 VF offset: 1, stride: 1, Device ID: 56c0
>                 Supported Page Size: 00000553, System Page Size: 00000001
>                 Region 0: Memory at 0000205e00000000 (64-bit, prefetchable)
>                 Region 2: Memory at 0000202000000000 (64-bit, prefetchable)
>                 VF Migration: offset: 00000000, BIR: 0
> 
> Relevant lines in dmesg
> -----------------------
> [  131.882092] i915 0000:53:00.0: PME# disabled
> [  131.882115] i915 0000:53:00.0: vgaarb: pci_notify
> [  131.997587] pci 0000:53:00.0: vgaarb: pci_notify
> [  131.997646] pcieport 0000:52:01.0: PME# disabled
> [  131.997658] pcieport 0000:52:01.0: vgaarb: pci_notify
> [  131.997675] pci 0000:52:01.0: vgaarb: pci_notify
> [  131.997690] pci 0000:53:00.0: vgaarb: pci_notify
> [  131.997788] pci 0000:53:00.0: vgaarb: pci_notify
> [  131.997811] pci 0000:53:00.0: device released
> [  131.997820] pci_bus 0000:53: busn_res: [bus 53-54] is released
> [  131.997868] pci 0000:52:01.0: vgaarb: pci_notify
> [  131.997953] pcieport 0000:51:00.0: saving config space at offset 0x0
> (reading 0x4fa08086)
> [  131.997960] pcieport 0000:51:00.0: saving config space at offset 0x4
> (reading 0x110147)
> [  131.997966] pcieport 0000:51:00.0: saving config space at offset 0x8
> (reading 0x6040001)
> [  131.997970] pcieport 0000:51:00.0: saving config space at offset 0xc
> (reading 0x10008)
> [  131.997975] pcieport 0000:51:00.0: saving config space at offset 0x10
> (reading 0x2000000c)
> [  131.997980] pcieport 0000:51:00.0: saving config space at offset 0x14
> (reading 0x205e)
> [  131.997985] pcieport 0000:51:00.0: saving config space at offset 0x18
> (reading 0x545251)
> [  131.997989] pcieport 0000:51:00.0: saving config space at offset 0x1c
> (reading 0x1f1)
> [  131.997993] pcieport 0000:51:00.0: saving config space at offset 0x20
> (reading 0xbb90bb80)
> [  131.997998] pcieport 0000:51:00.0: saving config space at offset 0x24
> (reading 0x1ff10001)
> [  131.998002] pcieport 0000:51:00.0: saving config space at offset 0x28
> (reading 0x201c)
> [  131.998007] pcieport 0000:51:00.0: saving config space at offset 0x2c
> (reading 0x205e)
> [  131.998011] pcieport 0000:51:00.0: saving config space at offset 0x30
> (reading 0x0)
> [  131.998015] pcieport 0000:51:00.0: saving config space at offset 0x34
> (reading 0x40)
> [  131.998020] pcieport 0000:51:00.0: saving config space at offset 0x38
> (reading 0x0)
> [  131.998024] pcieport 0000:51:00.0: saving config space at offset 0x3c
> (reading 0x301ff)
> [  131.998072] pcieport 0000:51:00.0: PME# enabled
> [  131.998122] pci 0000:52:01.0: vgaarb: pci_notify
> [  131.998140] pci 0000:52:01.0: device released
> [  132.009340] pcieport 0000:50:02.0: saving config space at offset 0x0
> (reading 0x347a8086)
> [  132.009353] pcieport 0000:50:02.0: saving config space at offset 0x4
> (reading 0x100547)
> [  132.009359] pcieport 0000:50:02.0: saving config space at offset 0x8
> (reading 0x6040004)
> [  132.009363] pcieport 0000:50:02.0: saving config space at offset 0xc
> (reading 0x10000)
> [  132.009368] pcieport 0000:50:02.0: saving config space at offset 0x10
> (reading 0x20800004)
> [  132.009372] pcieport 0000:50:02.0: saving config space at offset 0x14
> (reading 0x205e)
> [  132.009377] pcieport 0000:50:02.0: saving config space at offset 0x18
> (reading 0x545150)
> [  132.009381] pcieport 0000:50:02.0: saving config space at offset 0x1c
> (reading 0x200000f0)
> [  132.009385] pcieport 0000:50:02.0: saving config space at offset 0x20
> (reading 0xbb90bb80)
> [  132.009390] pcieport 0000:50:02.0: saving config space at offset 0x24
> (reading 0x20710001)
> [  132.009394] pcieport 0000:50:02.0: saving config space at offset 0x28
> (reading 0x201c)
> [  132.009398] pcieport 0000:50:02.0: saving config space at offset 0x2c
> (reading 0x205e)
> [  132.009402] pcieport 0000:50:02.0: saving config space at offset 0x30
> (reading 0x0)
> [  132.009406] pcieport 0000:50:02.0: saving config space at offset 0x34
> (reading 0x40)
> [  132.009411] pcieport 0000:50:02.0: saving config space at offset 0x38
> (reading 0x0)
> [  132.009415] pcieport 0000:50:02.0: saving config space at offset 0x3c
> (reading 0x201ff)
> [  132.009453] pcieport 0000:50:02.0: PME# enabled
> [  150.136581] pci_bus 0000:51: scanning bus
> [  150.148686] pcieport 0000:50:02.0: restoring config space at offset 0x2c
> (was 0x205e, writing 0x205e)
> [  150.148700] pcieport 0000:50:02.0: restoring config space at offset 0x28
> (was 0x201c, writing 0x201c)
> [  150.148708] pcieport 0000:50:02.0: restoring config space at offset 0x24
> (was 0x20710001, writing 0x20710001)
> [  150.148783] pcieport 0000:50:02.0: PME# disabled
> [  150.160911] pcieport 0000:51:00.0: restoring config space at offset 0x2c
> (was 0x205e, writing 0x205e)
> [  150.160925] pcieport 0000:51:00.0: restoring config space at offset 0x28
> (was 0x201c, writing 0x201c)
> [  150.160932] pcieport 0000:51:00.0: restoring config space at offset 0x24
> (was 0x1ff10001, writing 0x1ff10001)
> [  150.160967] pcieport 0000:51:00.0: PME# disabled
> [  150.160976] pcieport 0000:51:00.0: scanning [bus 52-54] behind bridge, pass
> 0
> [  150.160988] pci_bus 0000:52: scanning bus
> [  150.161024] pci 0000:52:01.0: [8086:4fa4] type 01 class 0x060400
> [  150.161219] pci 0000:52:01.0: PME# supported from D0 D3hot D3cold
> [  150.161228] pci 0000:52:01.0: PME# disabled
> [  150.161372] pci 0000:52:01.0: vgaarb: pci_notify
> [  150.161466] pci 0000:52:01.0: scanning [bus 53-54] behind bridge, pass 0
> [  150.161536] pci_bus 0000:53: scanning bus
> [  150.161565] pci 0000:53:00.0: [8086:56c0] type 00 class 0x038000
> [  150.161597] pci 0000:53:00.0: reg 0x10: [mem 0x205e1f000000-0x205e1fffffff
> 64bit pref]
> [  150.161620] pci 0000:53:00.0: reg 0x18: [mem 0x201c00000000-0x201fffffffff
> 64bit pref]
> [  150.161656] pci 0000:53:00.0: reg 0x30: [mem 0xffe00000-0xffffffff pref]
> [  150.161707] pci 0000:53:00.0: ASPM: overriding L1 acceptable latency from
> 0x0 to 0x7
> [  150.161787] pci 0000:53:00.0: PME# supported from D0 D3hot
> [  150.161794] pci 0000:53:00.0: PME# disabled
> [  150.161832] pci 0000:53:00.0: reg 0x344: [mem 0x205e00000000-0x205e00ffffff
> 64bit pref]
> [  150.161837] pci 0000:53:00.0: VF(n) BAR0 space: [mem
> 0x205e00000000-0x205e1effffff 64bit pref] (contains BAR0 for 31 VFs)
> [  150.161854] pci 0000:53:00.0: reg 0x34c: [mem 0x202000000000-0x2021ffffffff
> 64bit pref]
> [  150.161858] pci 0000:53:00.0: VF(n) BAR2 space: [mem
> 0x202000000000-0x205dffffffff 64bit pref] (contains BAR2 for 31 VFs)
> [  150.162112] pci 0000:53:00.0: vgaarb: pci_notify
> [  150.162173] pci_bus 0000:53: fixups for bus
> [  150.162177] pci 0000:52:01.0: PCI bridge to [bus 53-54]
> [  150.162187] pci 0000:52:01.0:   bridge window [mem 0xbb800000-0xbb9fffff]
> [  150.162198] pci 0000:52:01.0:   bridge window [mem
> 0x201c00000000-0x205e1fffffff 64bit pref]
> [  150.162202] pci_bus 0000:53: bus scan returning with max=53
> [  150.162210] pci 0000:52:01.0: scanning [bus 53-54] behind bridge, pass 1
> [  150.162219] pci_bus 0000:52: bus scan returning with max=54
> [  150.162225] pcieport 0000:51:00.0: scanning [bus 52-54] behind bridge, pass
> 1
> [  150.162233] pci_bus 0000:51: bus scan returning with max=54
> [  150.162240] pci 0000:52:01.0: bridge window [mem 0x200000000-0x45ffffffff
> 64bit pref] to [bus 53-54] add_size 3e00000000 add_align 200000000
> [  150.162259] pci 0000:52:01.0: BAR 15: no space for [mem size 0x8200000000
> 64bit pref]
> [  150.162265] pci 0000:52:01.0: BAR 15: failed to assign [mem size
> 0x8200000000 64bit pref]
> [  150.162270] pci 0000:52:01.0: BAR 14: assigned [mem 0xbb800000-0xbb9fffff]
> [  150.162278] pci 0000:52:01.0: BAR 15: no space for [mem size 0x4400000000
> 64bit pref]
> [  150.162282] pci 0000:52:01.0: BAR 15: failed to assign [mem size
> 0x4400000000 64bit pref]
> [  150.162286] pci 0000:52:01.0: BAR 14: assigned [mem 0xbb800000-0xbb9fffff]
> [  150.162295] pci 0000:53:00.0: BAR 2: no space for [mem size 0x400000000
> 64bit pref]
> [  150.162299] pci 0000:53:00.0: BAR 2: failed to assign [mem size 0x400000000
> 64bit pref]
> [  150.162304] pci 0000:53:00.0: BAR 9: no space for [mem size 0x3e00000000
> 64bit pref]
> [  150.162308] pci 0000:53:00.0: BAR 9: failed to assign [mem size 0x3e00000000
> 64bit pref]
> [  150.162313] pci 0000:53:00.0: BAR 0: no space for [mem size 0x01000000 64bit
> pref]
> [  150.162316] pci 0000:53:00.0: BAR 0: failed to assign [mem size 0x01000000
> 64bit pref]
> [  150.162321] pci 0000:53:00.0: BAR 7: no space for [mem size 0x1f000000 64bit
> pref]
> [  150.162325] pci 0000:53:00.0: BAR 7: failed to assign [mem size 0x1f000000
> 64bit pref]
> [  150.162329] pci 0000:53:00.0: BAR 6: assigned [mem 0xbb800000-0xbb9fffff
> pref]
> [  150.162336] pci 0000:53:00.0: BAR 2: no space for [mem size 0x400000000
> 64bit pref]
> [  150.162340] pci 0000:53:00.0: BAR 2: failed to assign [mem size 0x400000000
> 64bit pref]
> [  150.162345] pci 0000:53:00.0: BAR 0: no space for [mem size 0x01000000 64bit
> pref]
> [  150.162348] pci 0000:53:00.0: BAR 0: failed to assign [mem size 0x01000000
> 64bit pref]
> [  150.162352] pci 0000:53:00.0: BAR 6: assigned [mem 0xbb800000-0xbb9fffff
> pref]
> [  150.162357] pci 0000:53:00.0: BAR 9: no space for [mem size 0x3e00000000
> 64bit pref]
> [  150.162361] pci 0000:53:00.0: BAR 9: failed to assign [mem size 0x3e00000000
> 64bit pref]
> [  150.162365] pci 0000:53:00.0: BAR 7: no space for [mem size 0x1f000000 64bit
> pref]
> [  150.162369] pci 0000:53:00.0: BAR 7: failed to assign [mem size 0x1f000000
> 64bit pref]
> [  150.162374] pci 0000:52:01.0: PCI bridge to [bus 53-54]
> [  150.162382] pci 0000:52:01.0:   bridge window [mem 0xbb800000-0xbb9fffff]
> [  150.162418] pcieport 0000:52:01.0: vgaarb: pci_notify
> [  150.162426] pcieport 0000:52:01.0: runtime IRQ mapping not provided by arch
> [  150.162545] pcieport 0000:52:01.0: saving config space at offset 0x0
> (reading 0x4fa48086)
> [  150.162559] pcieport 0000:52:01.0: saving config space at offset 0x4
> (reading 0x100143)
> [  150.162565] pcieport 0000:52:01.0: saving config space at offset 0x8
> (reading 0x6040000)
> [  150.162570] pcieport 0000:52:01.0: saving config space at offset 0xc
> (reading 0x10008)
> [  150.162574] pcieport 0000:52:01.0: saving config space at offset 0x10
> (reading 0x0)
> [  150.162579] pcieport 0000:52:01.0: saving config space at offset 0x14
> (reading 0x0)
> [  150.162584] pcieport 0000:52:01.0: saving config space at offset 0x18
> (reading 0x545352)
> [  150.162589] pcieport 0000:52:01.0: saving config space at offset 0x1c
> (reading 0x200000f0)
> [  150.162594] pcieport 0000:52:01.0: saving config space at offset 0x20
> (reading 0xbb90bb80)
> [  150.162598] pcieport 0000:52:01.0: saving config space at offset 0x24
> (reading 0x1fff1)
> [  150.162603] pcieport 0000:52:01.0: saving config space at offset 0x28
> (reading 0x0)
> [  150.162607] pcieport 0000:52:01.0: saving config space at offset 0x2c
> (reading 0x0)
> [  150.162612] pcieport 0000:52:01.0: saving config space at offset 0x30
> (reading 0x0)
> [  150.162616] pcieport 0000:52:01.0: saving config space at offset 0x34
> (reading 0x40)
> [  150.162621] pcieport 0000:52:01.0: saving config space at offset 0x38
> (reading 0x0)
> [  150.162625] pcieport 0000:52:01.0: saving config space at offset 0x3c
> (reading 0x300ff)
> [  150.162766] pcieport 0000:52:01.0: vgaarb: pci_notify
> [  150.162856] i915 0000:53:00.0: vgaarb: pci_notify
> [  150.162868] i915 0000:53:00.0: runtime IRQ mapping not provided by arch
> [  150.163121] i915 0000:53:00.0: vgaarb: pci_notify



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux