Thanks Tom. Have found the root cause. An copy error when initialize smu function table. case AMDGPU_FAMILY_CZ: - hwmgr->smumgr_funcs = &ci_smu_funcs; + hwmgr->smumgr_funcs = &cz_smu_funcs; Best Regards Rex -----Original Message----- From: amd-gfx [mailto:amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of Tom St Denis Sent: Tuesday, September 26, 2017 2:26 AM To: amd-gfx at lists.freedesktop.org Subject: Re: powerplay change breaks driver To narrow things down it's likely something in the CZ code paths as it still crashes with the Polaris10 removed. Tom On 25/09/17 01:55 PM, Tom St Denis wrote: > This change > > commit f96306921d5e346ebc82c7c51ae6e0b736e5b425 > Author: Rex Zhu <Rex.Zhu at amd.com> > Date:  Wed Sep 20 14:44:55 2017 +0800 > >    drm/amd/powerplay: refine powerplay code. > >    delete struct smumgr, put smu backend function table >    in struct hwmgr > >    Change-Id: I7b73ef062b147b4e7199105a3c101f6c8038cc57 >    Reviewed-by: Alex Deucher <alexander.deucher at amd.com> >    Signed-off-by: Rex Zhu <Rex.Zhu at amd.com> > > > Results in this dmesg log error messages on my Carrizo + Polaris10 setup: > > [  24.237785] [drm] amdgpu kernel modesetting enabled. > [  24.237814] checking generic (c0000000 7e9000) vs hw (e0000000 > 10000000) [  24.237864] amdgpu 0000:00:01.0: enabling device (0006 -> > 0007) [  24.238366] [drm] initializing kernel modesetting (CARRIZO > 0x1002:0x9874 0x1002:0x1E10 0xE1). > [  24.238394] [drm] register mmio base: 0xD1300000 [  24.238394] > [drm] register mmio size: 262144 [  24.238463] ACPI Error: > [\_SB_.ALIB] Namespace lookup failure, AE_NOT_FOUND > (20170531/psargs-364) [  24.238497] ACPI Error: Method > parse/execution failed \_SB.PCI0.VGA.ATC0, AE_NOT_FOUND > (20170531/psparse-550) [  24.238528] ACPI Error: Method > parse/execution failed \_SB.PCI0.VGA.ATCS, AE_NOT_FOUND > (20170531/psparse-550) [  24.238558] [drm] UVD is enabled in physical > mode [  24.238561] [drm] VCE enabled in physical mode [  24.250365] > ATOM BIOS: 109-C95010-001 [  24.250381] [drm] GPU post is not needed > [  24.250407] [drm] vm size is 64 GB, block size is 13-bit, fragment > size is 9-bit [  24.250412] amdgpu 0000:00:01.0: VRAM: 512M > 0x000000F400000000 - 0x000000F41FFFFFFF (512M used) [  24.250413] > amdgpu 0000:00:01.0: GTT: 1024M 0x0000000000000000 - > 0x000000003FFFFFFF [  24.250420] [drm] Detected VRAM RAM=512M, > BAR=512M [  24.250421] [drm] RAM width 64bits UNKNOWN [  24.250795] > [TTM] Zone kernel: Available graphics memory: 3846244 kiB [  > 24.250797] [TTM] Zone  dma32: Available graphics memory: 2097152 kiB > [  24.250797] [TTM] Initializing pool allocator [  24.250801] [TTM] > Initializing DMA pool allocator [  24.250844] [drm] amdgpu: 512M of > VRAM memory ready [  24.250845] [drm] amdgpu: 3072M of GTT memory > ready. > [  24.250860] [drm] GART: num cpu pages 262144, num gpu pages 262144 > [  24.250970] [drm] PCIE GART of 1024M enabled (table at > 0x000000F400040000). > [  24.251017] amdgpu 0000:00:01.0: amdgpu: using MSI. > [  24.251034] [drm] amdgpu: irq initialized. > [  24.251037] amdgpu: [powerplay] amdgpu: powerplay sw initialized [  > 24.254140] [drm] Chained IB support enabled! > [  24.257056] amdgpu 0000:00:01.0: fence driver on ring 0 use gpu > addr 0x0000000000400080, cpu addr 0xffffc9000105d080 [  24.257196] > amdgpu 0000:00:01.0: fence driver on ring 1 use gpu addr > 0x0000000000400100, cpu addr 0xffffc9000105d100 [  24.257922] amdgpu > 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000000400180, > cpu addr 0xffffc9000105d180 [  24.258053] amdgpu 0000:00:01.0: fence > driver on ring 3 use gpu addr 0x0000000000400200, cpu addr > 0xffffc9000105d200 [  24.258115] amdgpu 0000:00:01.0: fence driver on > ring 4 use gpu addr 0x0000000000400280, cpu addr 0xffffc9000105d280 [  > 24.258146] amdgpu 0000:00:01.0: fence driver on ring 5 use gpu addr > 0x0000000000400300, cpu addr 0xffffc9000105d300 [  24.258353] amdgpu > 0000:00:01.0: fence driver on ring 6 use gpu addr 0x0000000000400380, > cpu addr 0xffffc9000105d380 [  24.258426] amdgpu 0000:00:01.0: fence > driver on ring 7 use gpu addr 0x0000000000400400, cpu addr > 0xffffc9000105d400 [  24.258484] amdgpu 0000:00:01.0: fence driver on > ring 8 use gpu addr 0x0000000000400480, cpu addr 0xffffc9000105d480 [  > 24.258528] amdgpu 0000:00:01.0: fence driver on ring 9 use gpu addr > 0x0000000000400520, cpu addr 0xffffc9000105d520 [  24.260159] amdgpu > 0000:00:01.0: fence driver on ring 10 use gpu addr 0x00000000004005a0, > cpu addr 0xffffc9000105d5a0 [  24.260508] amdgpu 0000:00:01.0: fence > driver on ring 11 use gpu addr 0x0000000000400620, cpu addr > 0xffffc9000105d620 [  24.261591] [drm] Found UVD firmware Version: > 1.91 Family ID: 11 [  24.262451] amdgpu 0000:00:01.0: fence driver on > ring 12 use gpu addr 0x000000f400296560, cpu addr 0xffffc90003442560 [  > 24.263350] [drm] Found VCE firmware Version: 52.4 Binary ID: 3 [  > 24.263819] amdgpu 0000:00:01.0: fence driver on ring 13 use gpu addr > 0x0000000000400720, cpu addr 0xffffc9000105d720 [  24.263921] amdgpu > 0000:00:01.0: fence driver on ring 14 use gpu addr 0x00000000004007a0, > cpu addr 0xffffc9000105d7a0 [  24.264438] amdgpu: [powerplay] Fail to > get clock table from SMU! > [  24.264440] amdgpu: [powerplay] amdgpu: powerplay initialization > failed [  24.264467] [drm] DAL is enabled [  24.264835] [drm] DC: > create_links: connectors_num: physical:3, > virtual:0 > [  24.264839] [drm] Connector[0] description:signal 32 [  24.264842] > [drm] Using channel: CHANNEL_ID_DDC1 [1] [  24.264851] [drm] > Connector[1] description:signal 4 [  24.264853] [drm] Using channel: > CHANNEL_ID_DDC2 [2] [  24.264860] [drm] Connector[2] > description:signal 4 [  24.264862] [drm] Using channel: > CHANNEL_ID_DDC3 [3] [  24.564284] [drm:hwss_wait_for_blank_complete > [amdgpu]] *ERROR* DC: > failed to blank crtc! > [  24.564329] [drm] Display Core initialized [  24.564332] [drm] > amdgpu: freesync_module init done ffff88021048afe0. > [  24.564564] [drm] link=0, dc_sink_in=         (null) is now > Disconnected [  24.564565] [drm] DCHPD: connector_id=0: dc_sink > didn't change. > [  24.564624] [drm] link=1, dc_sink_in=         (null) is now > Disconnected [  24.564624] [drm] DCHPD: connector_id=1: dc_sink > didn't change. > [  24.564738] [drm] link=2, dc_sink_in=         (null) is now > Disconnected [  24.564739] [drm] DCHPD: connector_id=2: dc_sink > didn't change. > [  24.564751] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [  24.564752] [drm] Driver supports precise vblank timestamp query. > [  24.564752] [drm] KMS initialized. > [  24.566110] [drm] ring test on 0 succeeded in 13 usecs [  > 24.755765] [drm:gfx_v8_0_kiq_resume [amdgpu]] *ERROR* KCQ enable > failed (scratch(0xC040)=0xCAFEDEAD) [  24.755819] > [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block > <gfx_v8_0> failed -22 [  24.755839] amdgpu 0000:00:01.0: amdgpu_init > failed [  24.756271] BUG: unable to handle kernel NULL pointer > dereference at >         (null) > [  24.756302] IP:          (null) > [  24.756312] PGD 2134b3067 > [  24.756312] P4D 2134b3067 > [  24.756320] PUD 0 > > [  24.756340] Oops: 0010 [#1] SMP > [  24.756349] Modules linked in: amdgpu(+) chash ttm ax88179_178a > usbnet xhci_pci xhci_hcd efivarfs [  24.756380] CPU: 3 PID: 3021 > Comm: modprobe Not tainted 4.13.0-rc5+ #33 [  24.756396] Hardware > name: AMD Myrtle/Myrtle, BIOS TMY1100A 03/23/2016 [  24.756413] task: > ffff8802132744c0 task.stack: ffffc90000fd0000 [  24.756427] RIP: > 0010:         (null) [  24.756437] RSP: 0018:ffffc90000fd3908 > EFLAGS: 00010202 [  24.756450] RAX: ffff88021048a460 RBX: > ffff8802100258a0 RCX: > 000000018020000d > [  24.756466] RDX: 000000018020000e RSI: 0000000000005c02 RDI: > ffff88021048a5a0 > [  24.756482] RBP: ffffc90000fd3928 R08: ffff880210f9e580 R09: > 000000018020000d > [  24.756499] R10: ffffc90000fd3948 R11: ffffea0008525e00 R12: > 0000000000005c02 > [  24.756516] R13: ffff88021365b690 R14: ffff880211db0040 R15: > ffff880211db2f30 > [  24.756534] FS: 00007ffa8be38700(0000) GS:ffff88021ed80000(0000) > knlGS:0000000000000000 > [  24.756554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [  > 24.756569] CR2: 0000000000000000 CR3: 0000000210030000 CR4: > 00000000001406e0 > [  24.756586] Call Trace: > [  24.756745] ? destroy+0x31/0x100 [amdgpu] [  24.756822] > dal_i2caux_destruct+0x5d/0x90 [amdgpu] [  24.756875] > destroy+0x15/0x30 [amdgpu] [  24.756925] > dal_i2caux_destroy+0x1b/0x30 [amdgpu] [  24.756977] > destruct+0x90/0x140 [amdgpu] [  24.757028] dc_destroy+0x10/0x30 > [amdgpu] [  24.757083] amdgpu_dm_fini+0x62/0x70 [amdgpu] [  > 24.757137] dm_hw_fini+0x1d/0x30 [amdgpu] [  24.757183] > amdgpu_fini+0xe8/0x330 [amdgpu] [  24.757229] > amdgpu_device_init+0xe5a/0x1560 [amdgpu] [  24.757245] ? > kmalloc_order_trace+0x29/0xd0 [  24.757290] ? > amdgpu_driver_load_kms+0x53/0x200 [amdgpu] [  24.757338] > amdgpu_driver_load_kms+0x78/0x200 [amdgpu] [  24.757353] > drm_dev_register+0x141/0x1d0 [  24.757393] > amdgpu_pci_probe+0x113/0x140 [amdgpu] [  24.757406] > local_pci_probe+0x40/0xa0 [  24.757416] pci_device_probe+0xaa/0x130 > [  24.757426] driver_probe_device+0x23e/0x2d0 [  24.757437] > __driver_attach+0x96/0xa0 [  24.757446] ? > driver_probe_device+0x2d0/0x2d0 [  24.757457] > bus_for_each_dev+0x5b/0x90 [  24.757467] driver_attach+0x19/0x20 [  > 24.757476] bus_add_driver+0x11c/0x220 [  24.757485] > driver_register+0x5b/0xd0 [  24.757495] > __pci_register_driver+0x47/0x50 [  24.757532] amdgpu_init+0x88/0x9b > [amdgpu] [  24.757544] ? 0xffffffffa030a000 [  24.757554] > do_one_initcall+0x3e/0x160 [  24.757566] ? __vunmap+0x7c/0xb0 [  > 24.757577] ? kfree+0x147/0x160 [  24.757587] ? > kmem_cache_alloc_trace+0x33/0x150 [  24.757602] > do_init_module+0x5a/0x1f1 [  24.757614] load_module+0x2329/0x28d0 [  > 24.758259] ? kernel_read_file+0x19e/0x1c0 [  24.758898] > SYSC_finit_module+0xba/0xc0 [  24.759524] ? > SYSC_finit_module+0xba/0xc0 [  24.760206] SyS_finit_module+0x9/0x10 > [  24.760835] entry_SYSCALL_64_fastpath+0x13/0x94 > [  24.761450] RIP: 0033:0x7ffa8b310219 [  24.762137] RSP: > 002b:00007ffe64b86b18 EFLAGS: 00000246 ORIG_RAX: > 0000000000000139 > [  24.762851] RAX: ffffffffffffffda RBX: 00000055ee325090 RCX: > 00007ffa8b310219 > [  24.763487] RDX: 0000000000000000 RSI: 00000055edf2d2a6 RDI: > 0000000000000005 > [  24.764116] RBP: 00000055ee326f50 R08: 0000000000000000 R09: > 0000000000000000 > [  24.764716] R10: 0000000000000005 R11: 0000000000000246 R12: > 00000055ee3252f0 > [  24.765298] R13: 00007ffe64b86ad8 R14: 00007ffe64b86ae0 R15: > 0000000000000000 > [  24.765878] Code: Bad RIP value. > [  24.766464] RIP:          (null) RSP: ffffc90000fd3908 [  > 24.767036] CR2: 0000000000000000 [  24.767717] ---[ end trace > 636f871b29b747e7 ]--- _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx at lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx