[Bug 108521] RX 580 as eGPU amdgpu: gpu post error!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bug ID 108521
Summary RX 580 as eGPU amdgpu: gpu post error!
Product DRI
Version unspecified
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter rstrube@gmail.com

Hello everyone,

I've been attempting to get my RX 580 working correctly as an eGPU using the
Akitio Node eGPU enclosure (over Thunderbolt 3).

I've confirmed that both the Akitio Node and my laptops Thunderbolt 3
controller are running the most up-to-date firmware.  I've also been able to
successfully authorize the Thunderbolt eGPU enclosure, and see the RX 580 in
lspci, see blow:

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor
Host Bridge/DRAM Registers (rev 05)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core
Processor PCIe Controller (x16) (rev 05)
00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500
v5/6th Gen Core Processor Thermal Subsystem (rev 05)
00:13.0 Non-VGA unclassified device: Intel Corporation 100 Series/C230 Series
Chipset Family Integrated Sensor Hub (rev 31)
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family
USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Series
Chipset Family Thermal Subsystem (rev 31)
00:15.0 Signal processing controller: Intel Corporation 100 Series/C230 Series
Chipset Family Serial IO I2C Controller #0 (rev 31)
00:15.1 Signal processing controller: Intel Corporation 100 Series/C230 Series
Chipset Family Serial IO I2C Controller #1 (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series
Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation HM170/QM170 Chipset SATA Controller
[AHCI Mode] (rev 31)
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI
Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI
Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI
Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation QM175 Chipset LPC/eSPI Controller (rev
31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset
Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation CM238 HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus
(rev 31)
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22
[Radeon RX Vega M GL] (rev c0)
02:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network
Adapter (rev 32)
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI
Express Card Reader (rev 01)
04:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
05:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step)
[Alpine Ridge 4C 2016] (rev 02)
06:00.0 System peripheral: Intel Corporation JHL6540 Thunderbolt 3 NHI (C step)
[Alpine Ridge 4C 2016] (rev 02)
07:00.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine
Ridge 2C 2015]
08:01.0 PCI bridge: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine
Ridge 2C 2015]
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7)
09:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon
RX 580]

Looking at just the RX 580 in more detail using lspci -v we have:

09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480/570/570X/580/580X] (rev e7) (prog-if 00 [VGA
controller])
        Subsystem: XFX Pine Group Inc. Ellesmere [Radeon RX
470/480/570/570X/580/580X]
        Flags: fast devsel, IRQ 18
        Memory at 2fb0000000 (64-bit, prefetchable) [size=256M]
        Memory at 2fc0000000 (64-bit, prefetchable) [size=2M]
        I/O ports at 2000 [size=256]
        Memory at bc000000 (32-bit, non-prefetchable) [size=256K]
        Expansion ROM at bc040000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010
<?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] #15
        Capabilities: [270] #19
        Capabilities: [2b0] Address Translation Service (ATS)
        Capabilities: [2c0] Page Request Interface (PRI)
        Capabilities: [2d0] Process Address Space ID (PASID)
        Capabilities: [320] Latency Tolerance Reporting
        Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [370] L1 PM Substates
        Kernel modules: amdgpu

When looking at demsg I see the following (I've removed non-relevant lines):

[    8.534250] amdgpu 0000:09:00.0: enabling device (0006 -> 0007)
[    8.534756] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF
0x1682:0xC580 0xE7).
[    8.537567] [drm] register mmio base: 0xBC000000
[    8.537568] [drm] register mmio size: 262144
[    8.537598] [drm] add ip block number 0 <vi_common>
[    8.537599] [drm] add ip block number 1 <gmc_v8_0>
[    8.537599] [drm] add ip block number 2 <tonga_ih>
[    8.537599] [drm] add ip block number 3 <powerplay>
[    8.537600] [drm] add ip block number 4 <dm>
[    8.537600] [drm] add ip block number 5 <gfx_v8_0>
[    8.537601] [drm] add ip block number 6 <sdma_v3_0>
[    8.537602] [drm] add ip block number 7 <uvd_v6_0>
[    8.537602] [drm] add ip block number 8 <vce_v3_0>
[    8.537608] kfd kfd: skipped device 1002:67df, PCI rejects atomics
[    8.537630] [drm] UVD is enabled in VM mode
[    8.537630] [drm] UVD ENC is enabled in VM mode
[    8.537636] [drm] VCE enabled in VM mode
[    8.614467] ATOM BIOS: 401815-171128-QS1
[    8.614512] [drm] GPU posting now...
[   13.621276] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for
more than 5secs aborting
[   13.621310] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing E650 (len 187, WS 0, PS 4) @ 0xE6FA
[   13.621341] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing C53A (len 193, WS 4, PS 4) @ 0xC569
[   13.621359] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing C410 (len 114, WS 0, PS 8) @ 0xC47C
[   13.621361] amdgpu 0000:09:00.0: gpu post error!
[   13.621363] amdgpu 0000:09:00.0: Fatal error during GPU init
[   13.621370] [drm] amdgpu: finishing device.
[   13.621792] amdgpu: probe of 0000:09:00.0 failed with error -22

Here are my system details:

System: Dell XPS 15 2 in 1 (Kaby Lake G)
Kernel: 4.19
Mesa: 18.2.2
Xorg: 1.20.1
Built in GPUs: Intel iGPU, Vega M
eGPU: RX 580

I'm not sure if I'm having problems because my laptop *also* contains a Vega M,
which also uses the amdgpu driver.  Perhaps there's a problem if there are
multiple GPUs using amdgpu?  One thing to point out is that the Vega M has
worked flawlessly since Kernel 4.18.x.

I did run across several other users posting about this same problem when
attempting to run AMD GPUs as eGPUs.  Here's a post where a user is reporting
the same issue:

https://egpu.io/forums/thunderbolt-linux-setup/egpus-under-linux-an-advanced-guide/#post-33304

And here's another post:

https://forum.manjaro.org/t/rx-580-in-a-thunderbolt-egpu-dock/58210

I'm comfortable applying and testing kernel patches, so please feel free to ask
me to test any fixes.  I'm currently running 4.19, but could also patch a
4.18.x kernel.

Thanks!


You are receiving this mail because:
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux