Re: [error] Drm -> amdgpu Unrecoverable Machine Check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Christian,
My "setpci -s 0001:01:00.0 ECAP15+4.l ECAP15+8.l" output is;

0001f000                                                                                                          
00000820

Regards.



Yusuf Altıparmak <yusufalti1997@xxxxxxxxx>, 2 Ara 2019 Pzt, 19:31 tarihinde şunu yazdı:
Most likely not. There is support for resizing the VRAM BAR, but usually you can only make it larger and not smaller.
Please give me the output of "sudo setpci -s 0001:01:00.0 ECAP15+4.l ECAP15+8.l" if you want to double check that.

Okay I'll try it tomorrow. What does the " sudo setpci -s 0001:01:00.0 ECAP15+4.l ECAP15+8.l" command exactly do ?

 
Well you rather need to ask if anybody has sample PCIe configuration for GPUs in general. That problem is not really E9171 related. You might want to ask NXP for that maybe.
Sorry, no idea if that is correct or not. You need to ask NXP for help with that.


Okay no problem. At least I know what is the missing point now. The problem is probably because of the .dtsi and u-boot config files. Memory ranges are overwriting like you said. I'll ask nxp to take some sample PCIe configuration for GPUs.

Thank you for your interest Christian.
Regards .
 

Am 02.12.19 um 14:32 schrieb Yusuf Altıparmak:

I attached my dts file.

System is working fine when GPU is not plugged in.

This is the last console log before freeze:
[drm] amdgpu kernel modesetting enabled.                                              
[drm] initializing kernel modesetting (POLARIS12 0x1002:0x6987 0x1787:0x2389 0x80). 
[drm] register mmio base: 0x20200000                                                  
fsl-fman-port ffe488000.port fm1-gb0: renamed from eth0                              
[drm] register mmio size: 262144                                                      
[drm] add ip block number 0 <vi_common>                                              
[drm] add ip block number 1 <gmc_v8_0>                                                
[drm] add ip block number 2 <tonga_ih>                                                
[drm] add ip block number 3 <powerplay>                                              
[drm] add ip block number 4 <dm>                                                      
[drm] add ip block number 5 <gfx_v8_0>                                                
[drm] add ip block number 6 <sdma_v3_0>                                              
[drm] add ip block number 7 <uvd_v6_0>                                                
[drm] add ip block number 8 <vce_v3_0>                                                
[drm] UVD is enabled in VM mode                                                      
[drm] UVD ENC is enabled in VM mode                                                  
[drm] VCE enabled in VM mode                                                          
ATOM BIOS: 113-ER16BFC-001                                                            
[drm] GPU posting now...                                                              
Disabling lock debugging due to kernel taint                                          
Machine check in kernel mode.                                                        
Caused by (from MCSR=a000): Load Error Report                                        
Guarded Load Error Report                                                            
Kernel panic - not syncing: Unrecoverable Machine check                              
CPU: 1 PID: 2023 Comm: udevd Tainted: G   M              4.19.26+gc0c2141 #1          
Call Trace:      
                    

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



Christian König <ckoenig.leichtzumerken@xxxxxxxxx>, 2 Ara 2019 Pzt, 15:28 tarihinde şunu yazdı:
Hi Yusuf,

Am 02.12.19 um 12:41 schrieb Yusuf Altıparmak:
My embedded board is freezing when I put E9171 on PCIe. What is the meaning of Unrecoverable Machine Check error about GPU?

Well see the explanation on Wikipedia for example: https://en.wikipedia.org/wiki/Machine-check_exception

In general it means you have messed up something in your hardware configuration.

Could PCIe settings in .dts file cause this problem?

Possible, but rather unlikely. My best guess is that it is some problem with the power supply.

If it is, is there any sample PCIe configuration for E9171?

The E9171 is just a PCIe device, so the dtsi is actually rather uninteresting. What we really need is a full dmesg and maybe lspci output would help as well.

Regards,
Christian.


Hi Christian,

At first, I am using NXP T1042D4RDB-64B which has 256 MB PCIe buffer according to its. PCIe memory range was arranged to 256 MB in .dts file and in U-boot configuration file. Driver was giving error with exit code -12 (OUT_OF_MEMORY). But I was able to reach the linux console.

[    5.512922] [drm] amdgpu kernel modesetting enabled.
[    5.517065] [drm] initializing kernel modesetting (POLARIS12 0x1002:0x6987 0x1787:0x2389 0x80).
[    5.524507] amdgpu 0001:01:00.0: Fatal error during GPU init
[    5.529296] amdgpu: probe of 0001:01:00.0 failed with error -12

Then I canged 256 MB to 4GB in .dtsi and U-boot conf file. I also changed 64KB I/O size to 1MB . When I do this, I wasn't able to reach the linux console because board was freezing. But driver was successfull at this time. I already mentioned successfull driver console logs up.

this is lspci -v when GPU is plugged and Memory size is 256 MB.

root@t1042d4rdb-64b:~# lspci -v
0000:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 11) (prog-if 00 [Normal decode])
        Device tree node: /sys/firmware/devicetree/base/pcie@ffe240000/pcie@0
        Flags: bus master, fast devsel, latency 0, IRQ 20
        Memory at <ignored> (32-bit, non-prefetchable)
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00000000-0000ffff [size=64K]
        Memory behind bridge: e0000000-efffffff [size=256M]
        Prefetchable memory behind bridge: None
        Capabilities: [44] Power Management version 3
        Capabilities: [4c] Express Root Port (Slot-), MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: pcieport

0001:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 11) (prog-if 00 [Normal decode])
        Device tree node: /sys/firmware/devicetree/base/pcie@ffe250000/pcie@0
        Flags: bus master, fast devsel, latency 0, IRQ 21
        Memory at <ignored> (32-bit, non-prefetchable)
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00000000-0000ffff [size=64K]
        Memory behind bridge: e0000000-efffffff [size=256M]
        Prefetchable memory behind bridge: None
        Capabilities: [44] Power Management version 3
        Capabilities: [4c] Express Root Port (Slot-), MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: pcieport

0001:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa [Radeon E9171 MCM] (rev 80) (prog-if 00 [VGA controller])
        Subsystem: Hightech Information System Ltd. Device 2389
        Flags: fast devsel, IRQ 41
        Memory at c10000000 (64-bit, prefetchable) [size=256M]
        Memory at <ignored> (64-bit, prefetchable)
        I/O ports at 1100 [size=256]
        Memory at <ignored> (32-bit, non-prefetchable)
        Expansion ROM at <ignored> [disabled]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] Resizable BAR <?>
        Capabilities: [270] Secondary PCI Express <?>
        Capabilities: [2b0] Address Translation Service (ATS)
        Capabilities: [2c0] Page Request Interface (PRI)
        Capabilities: [2d0] Process Address Space ID (PASID)
        Capabilities: [320] Latency Tolerance Reporting
        Capabilities: [328] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [370] L1 PM Substates
        Kernel modules: amdgpu

0001:01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aae0
        Subsystem: Hightech Information System Ltd. Device aae0
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at <ignored> (64-bit, non-prefetchable)
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [328] Alternative Routing-ID Interpretation (ARI)

0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 11) (prog-if 00 [Normal decode])
        Device tree node: /sys/firmware/devicetree/base/pcie@ffe260000/pcie@0
        Flags: bus master, fast devsel, latency 0, IRQ 22
        Memory at <ignored> (32-bit, non-prefetchable)
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00000000-0000ffff [size=64K]
        Memory behind bridge: e0000000-efffffff [size=256M]
        Prefetchable memory behind bridge: None
        Capabilities: [44] Power Management version 3
        Capabilities: [4c] Express Root Port (Slot-), MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: pcieport

0003:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0824 (rev 11) (prog-if 00 [Normal decode])
        Device tree node: /sys/firmware/devicetree/base/pcie@ffe270000/pcie@0
        Flags: bus master, fast devsel, latency 0, IRQ 23
        Memory at <ignored> (32-bit, non-prefetchable)
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00000000-0000ffff [size=64K]
        Memory behind bridge: e0000000-efffffff [size=256M]
        Prefetchable memory behind bridge: None
        Capabilities: [44] Power Management version 3
        Capabilities: [4c] Express Root Port (Slot-), MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: pcieport

AND This is PCIe dmesg message when memory range is 256MB. It's also giving same message when memory range is arranged as 4GB;

PCI host bridge /pcie@ffe240000  ranges:
 MEM 0x0000000c00000000..0x0000000c0fffffff -> 0x00000000e0000000
  IO 0x0000000ff8000000..0x0000000ff800ffff -> 0x0000000000000000
/pcie@ffe240000: PCICSRBAR @ 0xff000000
setup_pci_atmu: end of DRAM 200000000
/pcie@ffe240000: Setup 64-bit PCI DMA window
/pcie@ffe240000: WARNING: Outbound window cfg leaves gaps in memory map. Adjusting the memory map could reduce unnecessary bounce buffering.
/pcie@ffe240000: DMA window size is 0xe0000000
Found FSL PCI host bridge at 0x0000000ffe250000. Firmware bus number: 0->1
PCI host bridge /pcie@ffe250000  ranges:
 MEM 0x0000000c10000000..0x0000000c1fffffff -> 0x00000000e0000000
  IO 0x0000000ff8010000..0x0000000ff801ffff -> 0x0000000000000000
/pcie@ffe250000: PCICSRBAR @ 0xff000000
setup_pci_atmu: end of DRAM 200000000
/pcie@ffe250000: Setup 64-bit PCI DMA window
/pcie@ffe250000: WARNING: Outbound window cfg leaves gaps in memory map. Adjusting the memory map could reduce unnecessary bounce buffering.
/pcie@ffe250000: DMA window size is 0xe0000000
Found FSL PCI host bridge at 0x0000000ffe260000. Firmware bus number: 0->0
PCI host bridge /pcie@ffe260000  ranges:
 MEM 0x0000000c20000000..0x0000000c2fffffff -> 0x00000000e0000000
  IO 0x0000000ff8020000..0x0000000ff802ffff -> 0x0000000000000000
/pcie@ffe260000: PCICSRBAR @ 0xff000000
setup_pci_atmu: end of DRAM 200000000
/pcie@ffe260000: Setup 64-bit PCI DMA window
/pcie@ffe260000: WARNING: Outbound window cfg leaves gaps in memory map. Adjusting the memory map could reduce unnecessary bounce buffering.
/pcie@ffe260000: DMA window size is 0xe0000000
Found FSL PCI host bridge at 0x0000000ffe270000. Firmware bus number: 0->0
PCI host bridge /pcie@ffe270000  ranges:
 MEM 0x0000000c30000000..0x0000000c3fffffff -> 0x00000000e0000000
  IO 0x0000000ff8030000..0x0000000ff803ffff -> 0x0000000000000000
/pcie@ffe270000: PCICSRBAR @ 0xff000000
setup_pci_atmu: end of DRAM 200000000
/pcie@ffe270000: Setup 64-bit PCI DMA window
/pcie@ffe270000: WARNING: Outbound window cfg leaves gaps in memory map. Adjusting the memory map could reduce unnecessary bounce buffering.
/pcie@ffe270000: DMA window size is 0xe0000000
iommu: Adding device ff6000000.qman-portal to group 0
iommu: Adding device ff6004000.qman-portal to group 1
iommu: Adding device ff6008000.qman-portal to group 2
iommu: Adding device ff600c000.qman-portal to group 3
iommu: Adding device ff6010000.qman-portal to group 4
iommu: Adding device ff6014000.qman-portal to group 5
iommu: Adding device ff6018000.qman-portal to group 6
iommu: Adding device ff601c000.qman-portal to group 7
iommu: Adding device ff6020000.qman-portal to group 8
iommu: Adding device ff6024000.qman-portal to group 9
iommu: Adding device ffe100300.dma to group 10
iommu: Adding device ffe101300.dma to group 11
iommu: Adding device ffe114000.sdhc to group 12
iommu: Adding device ffe210000.usb to group 13
iommu: Adding device ffe211000.usb to group 14
iommu: Adding device ffe220000.sata to group 15
iommu: Adding device ffe221000.sata to group 16
iommu: Adding device ffe318000.qman to group 17
iommu: Adding device ffe31a000.bman to group 18
iommu: Adding device ffe240000.pcie to group 19
iommu: Adding device ffe250000.pcie to group 20
iommu: Adding device ffe260000.pcie to group 21
iommu: Adding device ffe270000.pcie to group 22
iommu: Adding device ffe140000.qe to group 23
software IO TLB: mapped [mem 0xfbfff000-0xfffff000] (64MB)
PCI: Probing PCI hardware
fsl-pci ffe240000.pcie: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x8000080000010000-0x800008000001ffff] (bus address [0x0000-0xffff])
pci_bus 0000:00: root bus resource [mem 0xc00000000-0xc0fffffff] (bus address [0xe0000000-0xefffffff])
pci_bus 0000:00: root bus resource [bus 00]
iommu: Removing device ffe240000.pcie from group 19
iommu: Adding device 0000:00:00.0 to group 24
pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:00:00.0: PCI bridge to [bus 01-ff]
fsl-pci ffe250000.pcie: PCI host bridge to bus 0001:00
pci_bus 0001:00: root bus resource [io  0x8000080000021000-0x8000080000030fff] (bus address [0x0000-0xffff])
pci_bus 0001:00: root bus resource [mem 0xc10000000-0xc1fffffff] (bus address [0xe0000000-0xefffffff])
pci_bus 0001:00: root bus resource [bus 00-01]
iommu: Removing device ffe250000.pcie from group 20
iommu: Adding device 0001:00:00.0 to group 19
pci 0001:01:00.0: enabling Extended Tags
pci 0001:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x1 link at 0001:00:00.0 (capable of 63.008 Gb/s with 8 GT/s x8 link)
iommu: Adding device 0001:01:00.0 to group 19
pci 0001:01:00.1: enabling Extended Tags
iommu: Adding device 0001:01:00.1 to group 19
pci 0001:00:00.0: PCI bridge to [bus 01-ff]
fsl-pci ffe260000.pcie: PCI host bridge to bus 0002:00
pci_bus 0002:00: root bus resource [io  0x8000080000032000-0x8000080000041fff] (bus address [0x0000-0xffff])
pci_bus 0002:00: root bus resource [mem 0xc20000000-0xc2fffffff] (bus address [0xe0000000-0xefffffff])
pci_bus 0002:00: root bus resource [bus 00]
iommu: Removing device ffe260000.pcie from group 21
iommu: Adding device 0002:00:00.0 to group 20
pci 0002:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0002:00:00.0: PCI bridge to [bus 01-ff]
fsl-pci ffe270000.pcie: PCI host bridge to bus 0003:00
pci_bus 0003:00: root bus resource [io  0x8000080000043000-0x8000080000052fff] (bus address [0x0000-0xffff])
pci_bus 0003:00: root bus resource [mem 0xc30000000-0xc3fffffff] (bus address [0xe0000000-0xefffffff])
pci_bus 0003:00: root bus resource [bus 00]
iommu: Removing device ffe270000.pcie from group 22
iommu: Adding device 0003:00:00.0 to group 21
pci 0003:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0003:00:00.0: PCI bridge to [bus 01-ff]
PCI: Cannot allocate resource region 0 of device 0000:00:00.0, will remap
PCI: Cannot allocate resource region 0 of device 0001:00:00.0, will remap
PCI: Cannot allocate resource region 2 of device 0001:01:00.0, will remap
PCI: Cannot allocate resource region 5 of device 0001:01:00.0, will remap
PCI: Cannot allocate resource region 6 of device 0001:01:00.0, will remap
PCI: Cannot allocate resource region 0 of device 0001:01:00.1, will remap
PCI: Cannot allocate resource region 0 of device 0002:00:00.0, will remap
PCI: Cannot allocate resource region 0 of device 0003:00:00.0, will remap
pci 0000:00:00.0: BAR 0: no space for [mem size 0x01000000]
pci 0000:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
pci 0000:00:00.0: PCI bridge to [bus 01]
pci 0000:00:00.0:   bridge window [io  0x8000080000010000-0x800008000001ffff]
pci 0000:00:00.0:   bridge window [mem 0xc00000000-0xc0fffffff]
pci_bus 0000:00: Some PCI device resources are unassigned, try booting with pci=realloc
pci 0001:00:00.0: BAR 0: no space for [mem size 0x01000000]
pci 0001:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
pci 0001:00:00.0: BAR 9: no space for [mem size 0x00200000 64bit pref]
pci 0001:00:00.0: BAR 9: failed to assign [mem size 0x00200000 64bit pref]
pci 0001:01:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
pci 0001:01:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
pci 0001:01:00.0: BAR 5: no space for [mem size 0x00040000]
pci 0001:01:00.0: BAR 5: failed to assign [mem size 0x00040000]
pci 0001:01:00.0: BAR 6: no space for [mem size 0x00020000 pref]
pci 0001:01:00.0: BAR 6: failed to assign [mem size 0x00020000 pref]
pci 0001:01:00.1: BAR 0: no space for [mem size 0x00004000 64bit]
pci 0001:01:00.1: BAR 0: failed to assign [mem size 0x00004000 64bit]
pci 0001:00:00.0: PCI bridge to [bus 01]
pci 0001:00:00.0:   bridge window [io  0x8000080000021000-0x8000080000030fff]
pci 0001:00:00.0:   bridge window [mem 0xc10000000-0xc1fffffff]
pci_bus 0001:00: Some PCI device resources are unassigned, try booting with pci=realloc
pci 0002:00:00.0: BAR 0: no space for [mem size 0x01000000]
pci 0002:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
pci 0002:00:00.0: PCI bridge to [bus 01]
pci 0002:00:00.0:   bridge window [io  0x8000080000032000-0x8000080000041fff]
pci 0002:00:00.0:   bridge window [mem 0xc20000000-0xc2fffffff]
pci_bus 0002:00: Some PCI device resources are unassigned, try booting with pci=realloc
pci 0003:00:00.0: BAR 0: no space for [mem size 0x01000000]
pci 0003:00:00.0: BAR 0: failed to assign [mem size 0x01000000]
pci 0003:00:00.0: PCI bridge to [bus 01]
pci 0003:00:00.0:   bridge window [io  0x8000080000043000-0x8000080000052fff]
pci 0003:00:00.0:   bridge window [mem 0xc30000000-0xc3fffffff]
pci_bus 0003:00: Some PCI device resources are unassigned, try booting with pci=realloc




_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux