Christian, Thanks for the response. That got me in the right direction. After trial and error I found the cause - Thunderbolt Boot Support option must be disabled in BIOS. If I disable it I can boot to Ubuntu and looks like amdgpu inits okay. If I enable with no other changes, init fails. The last issue was one of my own - forgetting to use DRI_PRIME and xrandr correctly. Happy to say the Red Devil is working now in eGPU mode! It's about a 20% perf loss over PCI-E slot and right in line with our previous tests. As always thank you for your continued time and support. We'll be happy to give a shout out to you guys for the help at article/video time. Respectfully, Daniel S. Moran (garwynn) PC Hardware Editor - XDA-Developers Phone: 1-559-316-0760/+81-90-5484-4155 Article Links: http://www.xda-developers.com/author/garwynn E-mail: xdagarwynn at gmail.com | Twitter: @xdagarwynn On Mon, Apr 9, 2018 at 10:48 PM, Christian König <christian.koenig at amd.com> wrote: > Hi Daniel, > > your problem is that the system BIOS is buggy and doesn't assign resources > to the card: > > Region 0: Memory at <ignored> (64-bit, prefetchable) > Region 2: Memory at <ignored> (64-bit, prefetchable) > Region 4: I/O ports at 9000 [size=256] > Region 5: Memory at <ignored> (32-bit, non-prefetchable) > Expansion ROM at <ignored> [disabled] > > > The kernel actually tries to assign resources to the bridges, but fails as > well because the BIOS didn't reserved any during startup. > > [ 0.179743] pci 0000:12:00.0: can't claim BAR 14 [mem > 0x01c00000-0xef0fffff]: no compatible bridge window > [ 0.179745] pci 0000:12:00.0: [mem 0x01c00000-0xef0fffff] clipped to > [mem 0xef000000-0xef0fffff] > [ 0.179747] pci 0000:12:00.0: bridge window [mem > 0xef000000-0xef0fffff] > [ 0.179751] pci 0000:13:01.0: can't claim BAR 14 [mem > 0x01c00000-0x01ffffff]: no compatible bridge window > [ 0.179753] pci 0000:14:00.0: can't claim BAR 14 [mem > 0x01c00000-0x01ffffff]: no compatible bridge window > [ 0.179754] pci 0000:15:00.0: can't claim BAR 14 [mem > 0x01d00000-0x01dfffff]: no compatible bridge window > [ 0.179756] pci 0000:08:04.0: can't claim BAR 13 [io 0xb000-0xcfff]: > address conflict with PCI Bus 0000:12 [io 0x9000-0xbfff] > [ 0.179782] pci 0000:14:00.0: can't claim BAR 0 [mem > 0x01c00000-0x01c03fff]: no compatible bridge window > [ 0.179789] pci 0000:16:00.0: can't claim BAR 0 [mem > 0xd0000000-0xdfffffff 64bit pref]: no compatible bridge window > [ 0.179791] pci 0000:16:00.0: can't claim BAR 2 [mem > 0xe0200000-0xe03fffff 64bit pref]: no compatible bridge window > [ 0.179793] pci 0000:16:00.0: can't claim BAR 5 [mem > 0x01d00000-0x01d7ffff]: no compatible bridge window > [ 0.179798] pci 0000:16:00.1: can't claim BAR 0 [mem > 0x01da0000-0x01da3fff]: no compatible bridge window > > > There isn't much you can do except for trying to update the BIOS and if > that doesn't help replace your motherboard. > > Regards, > Christian. > > > Am 09.04.2018 um 15:33 schrieb Daniel Moran: > > Christian, > Andrey, > > Thank you for the responses. > Here's the requested dmesg/lspci. Also pulled journalctl just in case but > didn't see anything that stands out. > > I'll take another look at the BIOS settings to see if anything else may > explain the memory error. > I've got 16GB in the system at the moment, can bump up to 32 - also added > a larger swap just in case that was the issue. (No change.) > > As always thank you for your continued time and support. > > Respectfully, > > Daniel S. Moran (garwynn) > PC Hardware Editor - XDA-Developers > Phone: 1-559-316-0760/+81-90-5484-4155 > Article Links: http://www.xda-developers.com/author/garwynn > E-mail: xdagarwynn at gmail.com | Twitter: @xdagarwynn > > On Mon, Apr 9, 2018 at 3:52 PM, Christian König <christian.koenig at amd.com> > wrote: > >> Please provide the full dmesg of the system as well as the output of >> "lspci -s 0000:16:00.0 -vvvv" as attachment. >> >> Thanks, >> Christian. >> >> Am 09.04.2018 um 06:00 schrieb Andrey Grodzovsky: >> >> Just from a quick look it seems to fail in amdgpu_device_init->ioremap >> with ENOMEM, that would explain why you don't see any more prints - this >> failure is very early in the device init process. >> >> No idea why ioremap would fail in this case and not even sure which >> implementation of ioremap to look into for your case. >> >> Adding Christian for this. >> >> Andrey >> >> On 04/07/2018 03:16 AM, Daniel Moran wrote: >> >> Also, to clarify... if I move it into a regular slot, turn off the eGPU >> it works as expected. >> Tested with Intel iGPU enabled and disabled, made sure i915 loaded >> without error and can connect display to it. >> >> >> >> Again, thank you in advance for any time/support offered. >> >> Respectfully, >> >> Daniel S. Moran (garwynn) >> PC Hardware Editor - XDA-Developers >> Phone: 1-559-316-0760/+81-90-5484-4155 >> Article Links: http://www.xda-developers.com/author/garwynn >> E-mail: xdagarwynn at gmail.com | Twitter: @xdagarwynn >> >> On Sat, Apr 7, 2018 at 3:58 PM, Daniel Moran <xdagarwynn at gmail.com> >> wrote: >> >>> Hello all, >>> >>> I've got a Powercolor Red Devil Vega 56 here that I'm trying to get >>> working in eGPU mode. >>> I think on the BIOS/hardware side it's now all fleshed out. >>> Now I'm at a point where amdgpu tries to init and reaches a fatal error. >>> >>> Set loglevel=8 doesn't get any additional messages. >>> Here's what it does report (full dmesg attached): >>> >>> [ 429.005909] [drm] amdgpu kernel modesetting enabled. >>> [ 429.006080] [drm] initializing kernel modesetting (VEGA10 >>> 0x1002:0x687F 0x148C:0x2388 0xC3). >>> [ 429.006082] amdgpu 0000:16:00.0: Fatal error during GPU init >>> [ 429.006155] amdgpu: probe of 0000:16:00.0 failed with error -12 >>> >>> Using the following commands to unload & reload for testing. Since it's >>> as an eGPU I'm using the i7-7700K iGPU (i915 module) as the primary and >>> these commands work in terminal without requiring a reboot. >>> >>> sudo rmmod amdgpu >>> sudo modprobe -v amgpu >>> >>> Pulled the UMR and tried to make, fails on Cmake. I'll attach log in a >>> text. >>> Also will attach a full dmesg and lspci dump. uname -a below: >>> *Linux testbox 4.15.15-041515-generic #201803311331 SMP Sat Mar 31 >>> 17:34:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux* >>> >>> Any other ideas on how I can debug this further? Feel I'm so close, >>> don't want to let this go. >>> Thank you in advance for your time. >>> >>> Respectfully, >>> >>> Daniel S. Moran (garwynn) >>> PC Hardware Editor - XDA-Developers >>> Phone: 1-559-316-0760/+81-90-5484-4155 >>> Article Links: http://www.xda-developers.com/author/garwynn >>> E-mail: xdagarwynn at gmail.com | Twitter: @xdagarwynn >>> >> >> >> >> _______________________________________________ >> amd-gfx mailing listamd-gfx at lists.freedesktop.orghttps://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180410/758f2b35/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2018-04-07 16-08-59.png Type: image/png Size: 60529 bytes Desc: not available URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180410/758f2b35/attachment-0001.png>