[Bug 108521] RX 580 as eGPU amdgpu: gpu post error!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Comment # 21 on bug 108521 from
Hi guys,

Apologies for the deluge of posts here, I've been trying really hard to
investigate this issue!

So I took a closer look at the PCI resource issues that you mentioned, I've
also been looking and thunderbolt driver issues in general, and I've noticed
that this type of log message is quite common.  Here's what I'm wondering:

These four devices correspond to the TB to PCI bridges in the system

0000:04:00.0
0000:05:01.0
0000:05:02.0
0000:05:04.0

04:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Bus: primary=04, secondary=05, subordinate=6e, sec-latency=0
        Memory behind bridge: bc000000-ea0fffff
        Prefetchable memory behind bridge: 0000002fb0000000-0000002ff9ffffff
        Capabilities: [80] Power Management version 3
        Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [ac] Subsystem: Intel Corporation JHL6540 Thunderbolt 3
Bridge (C step) [Alpine Ridge 4C 2016]
        Capabilities: [c0] Express Upstream Port, MSI 00
        Capabilities: [100] Device Serial Number b7-de-04-b0-a6-c9-a0-00
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Virtual Channel
        Capabilities: [400] Power Budgeting <?>
        Capabilities: [500] Vendor Specific Information: ID=1234 Rev=1 Len=0d8
<?>
        Capabilities: [600] Latency Tolerance Reporting
        Capabilities: [700] #19
        Kernel driver in use: pcieport

05:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Bus: primary=05, secondary=06, subordinate=06, sec-latency=0
        Memory behind bridge: ea000000-ea0fffff
        Capabilities: [80] Power Management version 3
        Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [ac] Subsystem: Intel Corporation JHL6540 Thunderbolt 3
Bridge (C step) [Alpine Ridge 4C 2016]
        Capabilities: [c0] Express Downstream Port (Slot+), MSI 00
        Capabilities: [100] Device Serial Number b7-de-04-b0-a6-c9-a0-00
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Virtual Channel
        Capabilities: [400] Power Budgeting <?>
        Capabilities: [500] Vendor Specific Information: ID=1234 Rev=1 Len=0d8
<?>
        Capabilities: [700] #19
        Kernel driver in use: pcieport

05:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Bus: primary=05, secondary=07, subordinate=39, sec-latency=0
        Memory behind bridge: bc000000-d3efffff
        Prefetchable memory behind bridge: 0000002fb0000000-0000002fcfffffff
        Capabilities: [80] Power Management version 3
        Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [ac] Subsystem: Intel Corporation JHL6540 Thunderbolt 3
Bridge (C step) [Alpine Ridge 4C 2016]
        Capabilities: [c0] Express Downstream Port (Slot+), MSI 00
        Capabilities: [100] Device Serial Number b7-de-04-b0-a6-c9-a0-00
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Virtual Channel
        Capabilities: [400] Power Budgeting <?>
        Capabilities: [500] Vendor Specific Information: ID=1234 Rev=1 Len=0d8
<?>
        Capabilities: [700] #19
        Kernel driver in use: pcieport

05:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Bus: primary=05, secondary=3a, subordinate=3a, sec-latency=0
        Memory behind bridge: d3f00000-d3ffffff
        Capabilities: [80] Power Management version 3
        Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [ac] Subsystem: Intel Corporation JHL6540 Thunderbolt 3
Bridge (C step) [Alpine Ridge 4C 2016]
        Capabilities: [c0] Express Downstream Port (Slot+), MSI 00
        Capabilities: [100] Device Serial Number b7-de-04-b0-a6-c9-a0-00
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Virtual Channel
        Capabilities: [400] Power Budgeting <?>
        Capabilities: [500] Vendor Specific Information: ID=1234 Rev=1 Len=0d8
<?>
        Capabilities: [700] #19
        Kernel driver in use: pcieport

05:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Bus: primary=05, secondary=3b, subordinate=6e, sec-latency=0
        Memory behind bridge: d4000000-e9ffffff
        Prefetchable memory behind bridge: 0000002fd0000000-0000002ff9ffffff
        Capabilities: [80] Power Management version 3
        Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [ac] Subsystem: Intel Corporation JHL6540 Thunderbolt 3
Bridge (C step) [Alpine Ridge 4C 2016]
        Capabilities: [c0] Express Downstream Port (Slot+), MSI 00
        Capabilities: [100] Device Serial Number b7-de-04-b0-a6-c9-a0-00
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Virtual Channel
        Capabilities: [400] Power Budgeting <?>
        Capabilities: [500] Vendor Specific Information: ID=1234 Rev=1 Len=0d8
<?>
        Capabilities: [700] #19
        Kernel driver in use: pcieport

First you see pci defining the bridge windows for devices:

[  104.290143] pci 0000:05:01.0: bridge window [io  0x1000-0x0fff] to [bus
07-39] add_size 1000
[  104.290152] pci 0000:05:02.0: bridge window [io  0x1000-0x0fff] to [bus 3a]
add_size 1000
[  104.290155] pci 0000:05:02.0: bridge window [mem 0x00100000-0x000fffff 64bit
pref] to [bus 3a] add_size 200000 add_align 100000
[  104.290169] pci 0000:05:04.0: bridge window [io  0x1000-0x0fff] to [bus
3b-6e] add_size 1000
[  104.290180] pci 0000:04:00.0: bridge window [io  0x1000-0x0fff] to [bus
05-6e] add_size 3000

Then you see a bunch of BAR errors, saying there's no space and that they can't
be assigned:

[  104.290184] pci 0000:04:00.0: BAR 13: no space for [io  size 0x3000]
[  104.290185] pci 0000:04:00.0: BAR 13: failed to assign [io  size 0x3000]
[  104.290187] pci 0000:04:00.0: BAR 13: no space for [io  size 0x3000]
[  104.290188] pci 0000:04:00.0: BAR 13: failed to assign [io  size 0x3000]
[  104.290193] pci 0000:05:02.0: BAR 15: no space for [mem size 0x00200000
64bit pref]
[  104.290194] pci 0000:05:02.0: BAR 15: failed to assign [mem size 0x00200000
64bit pref]
[  104.290196] pci 0000:05:01.0: BAR 13: no space for [io  size 0x1000]
[  104.290197] pci 0000:05:01.0: BAR 13: failed to assign [io  size 0x1000]
[  104.290198] pci 0000:05:02.0: BAR 13: no space for [io  size 0x1000]
[  104.290199] pci 0000:05:02.0: BAR 13: failed to assign [io  size 0x1000]
[  104.290201] pci 0000:05:04.0: BAR 13: no space for [io  size 0x1000]
[  104.290202] pci 0000:05:04.0: BAR 13: failed to assign [io  size 0x1000]
[  104.290203] pci 0000:05:04.0: BAR 13: no space for [io  size 0x1000]
[  104.290205] pci 0000:05:04.0: BAR 13: failed to assign [io  size 0x1000]
[  104.290207] pci 0000:05:02.0: BAR 15: no space for [mem size 0x00200000
64bit pref]
[  104.290208] pci 0000:05:02.0: BAR 15: failed to assign [mem size 0x00200000
64bit pref]
[  104.290209] pci 0000:05:02.0: BAR 13: no space for [io  size 0x1000]
[  104.290210] pci 0000:05:02.0: BAR 13: failed to assign [io  size 0x1000]
[  104.290212] pci 0000:05:01.0: BAR 13: no space for [io  size 0x1000]
[  104.290213] pci 0000:05:01.0: BAR 13: failed to assign [io  size 0x1000]

But then you see that the PCI bridges seem to initialize for all the devices:

[  104.290215] pci 0000:05:00.0: PCI bridge to [bus 06]
[  104.290221] pci 0000:05:00.0:   bridge window [mem 0xea000000-0xea0fffff]
[  104.290231] pci 0000:05:01.0: PCI bridge to [bus 07-39]
[  104.290237] pci 0000:05:01.0:   bridge window [mem 0xbc000000-0xd3efffff]
[  104.290241] pci 0000:05:01.0:   bridge window [mem 0x2fb0000000-0x2fcfffffff
64bit pref]
[  104.290248] pci 0000:05:02.0: PCI bridge to [bus 3a]
[  104.290254] pci 0000:05:02.0:   bridge window [mem 0xd3f00000-0xd3ffffff]
[  104.290264] pci 0000:05:04.0: PCI bridge to [bus 3b-6e]
[  104.290270] pci 0000:05:04.0:   bridge window [mem 0xd4000000-0xe9ffffff]
[  104.290274] pci 0000:05:04.0:   bridge window [mem 0x2fd0000000-0x2ff9ffffff
64bit pref]
[  104.290281] pci 0000:04:00.0: PCI bridge to [bus 05-6e]
[  104.290286] pci 0000:04:00.0:   bridge window [mem 0xbc000000-0xea0fffff]
[  104.290291] pci 0000:04:00.0:   bridge window [mem 0x2fb0000000-0x2ff9ffffff
64bit pref]

Perhaps the BAR errors are just a red herring and at the end of the process all
of the the Thunderbolt PCI bridges *are* initialized correctly?

As I said, I've probably spent way too much time looking at this, the main
thing I keep coming back to is that my other GPU *does* work correctly as an
eGPU.  It's also a PCI x16 card (I know it's operating over PCI x4 due to TB3
bandwitch limitations), so theoretically if there were any PCI resource
problems with the Thunderbolt bridge then this GPU should also fail, correct?

I noticed a couple other things in my research:

I found a bug that points to tlp (specifically power management) as causing the
same problems with the atom bios being stuck in a loop:
https://bugs.freedesktop.org/show_bug.cgi?id=103783
Perhaps the issue is caused by some sort of aggressive PM?  I might try adding
some kernel boot parameters amdgpu.dpm=0 amdgpu.apm=0 etc.

I was also thinking that perhaps I should try the AMDGPU-PRO drivers just to
see if they would work by chance.  Somebody else reported that these drivers
worked, while the amdgpu drivers failed.  It's worth a shot.

Thanks for any feedback and/or advice!
Rob


You are receiving this mail because:
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux