Am 18.04.24 um 12:42 schrieb Dag B:
[SNIP]
Is there a good ELI13 resource explaining how resizable BAR works
in Linux?
My current kernel command-line contains: pci=assign-busses,realloc
That's a really really bad idea. The "assign-busses" flag was
introduced to get 20year old laptops to see their cardbus PCI devices.
I threw a lot of mud at the wall to see what stuck. Removing it now
did not make a big difference.
Removing realloc prevents the second TB3 GPU from being initialized,
so keeping that for now.
That's really interesting. Why does it fail without that?
It basically means that your BIOS is somehow broken and only the Linux
PCI subsystem is able to assign resources correctly.
Please provide the output of "sudo lspci -v" and "sudo lspci -tv" as
file attachment (*not* inline in a mail!).
My GPU is attached via TB3 to a system for which resizable BAR is
and will
remain a foreign concept in the BIOS.
What happens if you hot remove and re-plug the TB3 after the system
has started?
Much the same as during initial boot. Both good and bad. See below.
Do any of the pci=hp* options have any significance/impact on what
dmesg says below?
No, the pci=hp* options are a hint for the PCI subsystem how much
address space to assign to each hot plugged bridge and device from the
upstream bridge.
But if your BIOS doesn't assign anything to the upstream bridge or your
don't have a window big enough on the root complex you are pretty much
busted from the beginning.
And that you get all those "pnp 00:0b: disabling" message also doesn't
makes the BIOS trustworthy.
Is IO address space moveable?
No, the IO address space is usually only assigned when the device starts
with VGA emulation. And as long as VGA emulation is active you can't
move or resize anything.
What drivers usually do is to turn of the VGA emulation (which also
disables the IO address space) and then resize.
Relevant kernel config/options impacting this? Is it all in the hands
of the device driver?
Well the pci=hp* options are already all you need. But I don't think
they will help in this case here.
So, so many questions. And barely competent to ask them. Please
forgive me.
Current kernel command-line snippet:
pci=realloc,hpiosize=16K,hpmemsize=64M,pcie_scan_all,hpbussize=0x33
I very much appreciate your input. Will try to get the attention of
the people responsible for the driver.
Thanks,
Dag B
p53 ~ # dmesg | grep 09:00.0
[ 0.471780] pci 0000:09:00.0: [10de:2204] type 00 class 0x030000
PCIe Legacy Endpoint
[ 0.471816] pci 0000:09:00.0: BAR 0 [mem 0x00000000-0x00ffffff]
[ 0.471844] pci 0000:09:00.0: BAR 1 [mem 0x00000000-0x0fffffff
64bit pref]
[ 0.471873] pci 0000:09:00.0: BAR 3 [mem 0x00000000-0x01ffffff
64bit pref]
[ 0.471890] pci 0000:09:00.0: BAR 5 [io 0x0000-0x007f]
[ 0.471907] pci 0000:09:00.0: ROM [mem 0x00000000-0x0007ffff pref]
[ 0.472133] pci 0000:09:00.0: PME# supported from D0 D3hot
[ 0.472382] pci 0000:09:00.0: 8.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s PCIe x4 link at 0000:05:01.0 (capable of 252.048
Gb/s with 16.0 GT/s PCIe x16 link)
[ 0.491866] pci 0000:09:00.0: vgaarb: bridge control possible
[ 0.491866] pci 0000:09:00.0: vgaarb: VGA device added:
decodes=io+mem,owns=none,locks=none
[ 0.491866] pnp 00:03: disabling [io 0x002e-0x002f] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:03: disabling [io 0x004e-0x004f] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:03: disabling [io 0x0061] because it overlaps
0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:03: disabling [io 0x0063] because it overlaps
0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:03: disabling [io 0x0065] because it overlaps
0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:03: disabling [io 0x0067] because it overlaps
0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:03: disabling [io 0x0070] because it overlaps
0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:08: disabling [io 0x0010-0x001f] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:08: disabling [io 0x0024-0x0025] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:08: disabling [io 0x0028-0x0029] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:08: disabling [io 0x002c-0x002d] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:08: disabling [io 0x0030-0x0031] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:08: disabling [io 0x0034-0x0035] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:08: disabling [io 0x0038-0x0039] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:08: disabling [io 0x003c-0x003d] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:08: disabling [io 0x0050-0x0053] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.491866] pnp 00:08: disabling [io 0x0072-0x0077] because it
overlaps 0000:09:00.0 BAR 5 [io 0x0000-0x007f]
[ 0.493216] pnp 00:0b: disabling [mem 0x00000000-0x0009ffff]
because it overlaps 0000:09:00.0 BAR 0 [mem 0x00000000-0x00ffffff]
[ 0.493220] pnp 00:0b: disabling [mem 0x000c0000-0x000c3fff
disabled] because it overlaps 0000:09:00.0 BAR 0 [mem
0x00000000-0x00ffffff]
[ 0.493225] pnp 00:0b: disabling [mem 0x000c8000-0x000cbfff
disabled] because it overlaps 0000:09:00.0 BAR 0 [mem
0x00000000-0x00ffffff]
[ 0.493230] pnp 00:0b: disabling [mem 0x000d0000-0x000d3fff
disabled] because it overlaps 0000:09:00.0 BAR 0 [mem
0x00000000-0x00ffffff]
[ 0.493234] pnp 00:0b: disabling [mem 0x000d8000-0x000dbfff
disabled] because it overlaps 0000:09:00.0 BAR 0 [mem
0x00000000-0x00ffffff]
[ 0.493238] pnp 00:0b: disabling [mem 0x000e0000-0x000e3fff]
because it overlaps 0000:09:00.0 BAR 0 [mem 0x00000000-0x00ffffff]
[ 0.493242] pnp 00:0b: disabling [mem 0x000e8000-0x000ebfff]
because it overlaps 0000:09:00.0 BAR 0 [mem 0x00000000-0x00ffffff]
[ 0.493247] pnp 00:0b: disabling [mem 0x000f0000-0x000fffff]
because it overlaps 0000:09:00.0 BAR 0 [mem 0x00000000-0x00ffffff]
[ 0.493251] pnp 00:0b: disabling [mem 0x00100000-0x8f7fffff]
because it overlaps 0000:09:00.0 BAR 0 [mem 0x00000000-0x00ffffff]
[ 0.493255] pnp 00:0b: disabling [mem 0x00000000-0x0009ffff
disabled] because it overlaps 0000:09:00.0 BAR 1 [mem
0x00000000-0x0fffffff 64bit pref]
[ 0.493260] pnp 00:0b: disabling [mem 0x000c0000-0x000c3fff
disabled] because it overlaps 0000:09:00.0 BAR 1 [mem
0x00000000-0x0fffffff 64bit pref]
[ 0.493265] pnp 00:0b: disabling [mem 0x000c8000-0x000cbfff
disabled] because it overlaps 0000:09:00.0 BAR 1 [mem
0x00000000-0x0fffffff 64bit pref]
[ 0.493270] pnp 00:0b: disabling [mem 0x000d0000-0x000d3fff
disabled] because it overlaps 0000:09:00.0 BAR 1 [mem
0x00000000-0x0fffffff 64bit pref]
[ 0.493274] pnp 00:0b: disabling [mem 0x000d8000-0x000dbfff
disabled] because it overlaps 0000:09:00.0 BAR 1 [mem
0x00000000-0x0fffffff 64bit pref]
[ 0.493279] pnp 00:0b: disabling [mem 0x000e0000-0x000e3fff
disabled] because it overlaps 0000:09:00.0 BAR 1 [mem
0x00000000-0x0fffffff 64bit pref]
[ 0.493283] pnp 00:0b: disabling [mem 0x000e8000-0x000ebfff
disabled] because it overlaps 0000:09:00.0 BAR 1 [mem
0x00000000-0x0fffffff 64bit pref]
[ 0.493288] pnp 00:0b: disabling [mem 0x000f0000-0x000fffff
disabled] because it overlaps 0000:09:00.0 BAR 1 [mem
0x00000000-0x0fffffff 64bit pref]
[ 0.493292] pnp 00:0b: disabling [mem 0x00100000-0x8f7fffff
disabled] because it overlaps 0000:09:00.0 BAR 1 [mem
0x00000000-0x0fffffff 64bit pref]
[ 0.493297] pnp 00:0b: disabling [mem 0x00000000-0x0009ffff
disabled] because it overlaps 0000:09:00.0 BAR 3 [mem
0x00000000-0x01ffffff 64bit pref]
[ 0.493302] pnp 00:0b: disabling [mem 0x000c0000-0x000c3fff
disabled] because it overlaps 0000:09:00.0 BAR 3 [mem
0x00000000-0x01ffffff 64bit pref]
[ 0.493306] pnp 00:0b: disabling [mem 0x000c8000-0x000cbfff
disabled] because it overlaps 0000:09:00.0 BAR 3 [mem
0x00000000-0x01ffffff 64bit pref]
[ 0.493311] pnp 00:0b: disabling [mem 0x000d0000-0x000d3fff
disabled] because it overlaps 0000:09:00.0 BAR 3 [mem
0x00000000-0x01ffffff 64bit pref]
[ 0.493315] pnp 00:0b: disabling [mem 0x000d8000-0x000dbfff
disabled] because it overlaps 0000:09:00.0 BAR 3 [mem
0x00000000-0x01ffffff 64bit pref]
[ 0.493320] pnp 00:0b: disabling [mem 0x000e0000-0x000e3fff
disabled] because it overlaps 0000:09:00.0 BAR 3 [mem
0x00000000-0x01ffffff 64bit pref]
[ 0.493324] pnp 00:0b: disabling [mem 0x000e8000-0x000ebfff
disabled] because it overlaps 0000:09:00.0 BAR 3 [mem
0x00000000-0x01ffffff 64bit pref]
[ 0.493329] pnp 00:0b: disabling [mem 0x000f0000-0x000fffff
disabled] because it overlaps 0000:09:00.0 BAR 3 [mem
0x00000000-0x01ffffff 64bit pref]
[ 0.493333] pnp 00:0b: disabling [mem 0x00100000-0x8f7fffff
disabled] because it overlaps 0000:09:00.0 BAR 3 [mem
0x00000000-0x01ffffff 64bit pref]
[ 0.503894] pci 0000:09:00.0: BAR 1 [mem 0x6000000000-0x600fffffff
64bit pref]: assigned
[ 0.503940] pci 0000:09:00.0: BAR 3 [mem 0x6010000000-0x6011ffffff
64bit pref]: assigned
[ 0.503963] pci 0000:09:00.0: BAR 0 [mem 0xa4000000-0xa4ffffff]:
assigned
[ 0.503972] pci 0000:09:00.0: ROM [mem 0xa5000000-0xa507ffff pref]:
assigned
[ 0.503984] pci 0000:09:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 0.503987] pci 0000:09:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 0.504331] pci 0000:09:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 0.504334] pci 0000:09:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 0.504704] pci 0000:09:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 0.504707] pci 0000:09:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 0.505073] pci 0000:09:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 0.505076] pci 0000:09:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 0.505441] pci 0000:09:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 0.505444] pci 0000:09:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 0.505810] pci 0000:09:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 0.505813] pci 0000:09:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 0.507057] pci 0000:09:00.1: D0 power state depends on 0000:09:00.0
[ 0.509437] pci 0000:09:00.0: Adding to iommu group 23
[ 2.833427] nvidia 0000:09:00.0: enabling device (0000 -> 0002)
[ 2.833519] nvidia 0000:09:00.0: vgaarb: VGA decodes changed:
olddecodes=io+mem,decodes=none:owns=none
[ 4.954613] [drm] Initialized nvidia-drm 0.0.0 20160202 for
0000:09:00.0 on minor 2
[ 228.414765] NVRM: GPU 0000:09:00.0: GPU has fallen off the bus.
[ 228.445633] pci 0000:09:00.0: Unable to change power state from
unknown to D0, device inaccessible
[ 233.991103] pci 0000:09:00.0: [10de:2204] type 00 class 0x030000
PCIe Legacy Endpoint
[ 233.993053] pci 0000:09:00.0: BAR 0 [mem 0x00000000-0x00ffffff]
[ 233.994986] pci 0000:09:00.0: BAR 1 [mem 0x00000000-0x0fffffff
64bit pref]
[ 233.996854] pci 0000:09:00.0: BAR 3 [mem 0x00000000-0x01ffffff
64bit pref]
[ 233.998727] pci 0000:09:00.0: BAR 5 [io 0x0000-0x007f]
[ 234.000585] pci 0000:09:00.0: ROM [mem 0x00000000-0x0007ffff pref]
[ 234.002720] pci 0000:09:00.0: PME# supported from D0 D3hot
[ 234.004889] pci 0000:09:00.0: 8.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s PCIe x4 link at 0000:05:01.0 (capable of 252.048
Gb/s with 16.0 GT/s PCIe x16 link)
[ 234.007000] pci 0000:09:00.0: Adding to iommu group 23
[ 234.008925] pci 0000:09:00.0: vgaarb: bridge control possible
[ 234.010828] pci 0000:09:00.0: vgaarb: VGA device added:
decodes=io+mem,owns=none,locks=none
[ 234.087850] pci 0000:09:00.0: BAR 1 [mem 0x6000000000-0x600fffffff
64bit pref]: assigned
[ 234.089631] pci 0000:09:00.0: BAR 3 [mem 0x6010000000-0x6011ffffff
64bit pref]: assigned
[ 234.091492] pci 0000:09:00.0: BAR 0 [mem 0xa4000000-0xa4ffffff]:
assigned
[ 234.093241] pci 0000:09:00.0: ROM [mem 0xa5000000-0xa507ffff pref]:
assigned
[ 234.096831] pci 0000:09:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 234.098652] pci 0000:09:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 234.155043] pci 0000:09:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 234.156615] pci 0000:09:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 234.183809] nvidia 0000:09:00.0: enabling device (0000 -> 0002)
[ 234.185579] nvidia 0000:09:00.0: vgaarb: VGA decodes changed:
olddecodes=io+mem,decodes=none:owns=none
[ 234.310173] pci 0000:09:00.1: D0 power state depends on 0000:09:00.0
And doing the same for the 2nd GPU:
p53 ~ # dmesg | grep 2f:00.0
[ 1.215862] pci 0000:2f:00.0: [10de:2204] type 00 class 0x030000
PCIe Legacy Endpoint
[ 1.215893] pci 0000:2f:00.0: BAR 0 [mem 0x00000000-0x00ffffff]
[ 1.215918] pci 0000:2f:00.0: BAR 1 [mem 0x00000000-0x0fffffff
64bit pref]
[ 1.215942] pci 0000:2f:00.0: BAR 3 [mem 0x00000000-0x01ffffff
64bit pref]
[ 1.215956] pci 0000:2f:00.0: BAR 5 [io 0x0000-0x007f]
[ 1.215970] pci 0000:2f:00.0: ROM [mem 0x00000000-0x0007ffff pref]
[ 1.216765] pci 0000:2f:00.0: PME# supported from D0 D3hot
[ 1.217000] pci 0000:2f:00.0: 8.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s PCIe x4 link at 0000:05:04.0 (capable of 252.048
Gb/s with 16.0 GT/s PCIe x16 link)
[ 1.217226] pci 0000:2f:00.0: Adding to iommu group 29
[ 1.217237] pci 0000:2f:00.0: vgaarb: bridge control possible
[ 1.217238] pci 0000:2f:00.0: vgaarb: VGA device added:
decodes=io+mem,owns=none,locks=none
[ 1.218458] pci 0000:2f:00.0: BAR 1 [mem 0x6020000000-0x602fffffff
64bit pref]: assigned
[ 1.218481] pci 0000:2f:00.0: BAR 3 [mem 0x6030000000-0x6031ffffff
64bit pref]: assigned
[ 1.218501] pci 0000:2f:00.0: BAR 0 [mem 0xb1000000-0xb1ffffff]:
assigned
[ 1.218507] pci 0000:2f:00.0: ROM [mem 0xb0800000-0xb087ffff pref]:
assigned
[ 1.218514] pci 0000:2f:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 1.218514] pci 0000:2f:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 1.218579] pci 0000:2f:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 1.218580] pci 0000:2f:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 1.219748] pci 0000:2f:00.1: D0 power state depends on 0000:2f:00.0
[ 2.883186] nvidia 0000:2f:00.0: enabling device (0000 -> 0002)
[ 2.883945] nvidia 0000:2f:00.0: vgaarb: VGA decodes changed:
olddecodes=io+mem,decodes=none:owns=none
[ 6.367931] [drm] Initialized nvidia-drm 0.0.0 20160202 for
0000:2f:00.0 on minor 3
[ 485.913085] NVRM: GPU 0000:2f:00.0: GPU has fallen off the bus.
[ 485.913727] NVRM: GPU 0000:2f:00.0: GPU serial number is
PKWUQ0B9VFK0SG.
[ 485.938963] pci 0000:2f:00.0: Unable to change power state from
unknown to D0, device inaccessible
[ 489.941767] pci 0000:2f:00.0: [10de:2204] type 00 class 0x030000
PCIe Legacy Endpoint
[ 489.944551] pci 0000:2f:00.0: BAR 0 [mem 0x00000000-0x00ffffff]
[ 489.947287] pci 0000:2f:00.0: BAR 1 [mem 0x00000000-0x0fffffff
64bit pref]
[ 489.950056] pci 0000:2f:00.0: BAR 3 [mem 0x00000000-0x01ffffff
64bit pref]
[ 489.952835] pci 0000:2f:00.0: BAR 5 [io 0x0000-0x007f]
[ 489.955655] pci 0000:2f:00.0: ROM [mem 0x00000000-0x0007ffff pref]
[ 489.958721] pci 0000:2f:00.0: PME# supported from D0 D3hot
[ 489.961746] pci 0000:2f:00.0: 8.000 Gb/s available PCIe bandwidth,
limited by 2.5 GT/s PCIe x4 link at 0000:05:04.0 (capable of 252.048
Gb/s with 16.0 GT/s PCIe x16 link)
[ 489.964859] pci 0000:2f:00.0: Adding to iommu group 29
[ 489.967703] pci 0000:2f:00.0: vgaarb: bridge control possible
[ 489.970506] pci 0000:2f:00.0: vgaarb: VGA device added:
decodes=io+mem,owns=none,locks=none
[ 490.025678] pci 0000:2f:00.0: BAR 1 [mem 0x6020000000-0x602fffffff
64bit pref]: assigned
[ 490.027887] pci 0000:2f:00.0: BAR 3 [mem 0x6030000000-0x6031ffffff
64bit pref]: assigned
[ 490.029918] pci 0000:2f:00.0: BAR 0 [mem 0xb1000000-0xb1ffffff]:
assigned
[ 490.031940] pci 0000:2f:00.0: ROM [mem 0xb0800000-0xb087ffff pref]:
assigned
[ 490.036008] pci 0000:2f:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 490.038096] pci 0000:2f:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 490.075208] pci 0000:2f:00.0: BAR 5 [io size 0x0080]: can't
assign; no space
[ 490.077005] pci 0000:2f:00.0: BAR 5 [io size 0x0080]: failed to
assign
[ 490.099288] nvidia 0000:2f:00.0: enabling device (0000 -> 0002)
[ 490.101217] nvidia 0000:2f:00.0: vgaarb: VGA decodes changed:
olddecodes=io+mem,decodes=none:owns=none
[ 490.265952] pci 0000:2f:00.1: D0 power state depends on 0000:2f:00.0
BAR 5 is missing in the lspci output. Same for both.
lspci specifies 'Physical Resizable'. Is that implied for all BARs?
No, that means only BAR1 is resizeable.
Regards,
Christian.
Capabilities: [bb0 v1] Physical Resizable BAR
BAR 0: current size: 16MB, supported: 16MB
BAR 1: current size: 256MB, supported: 64MB 128MB 256MB 512MB
1GB 2GB 4GB 8GB 16GB 32GB
BAR 3: current size: 32MB, supported: 32MB