#regzbot introduced: 1cfb4d612127 #regzbot title: rx7600 stopped working after "1cfb4d612127 drm/amdgpu: put MQDs in VRAM" Hi all, I've been playing with RX7600 and it was observed that amdgpu stopped working between kernel 6.2 and 6.5. Then I narrowed it down to 6.4 <-> 6.5-rc1 and finally bisect pointed at 1cfb4d6121276a829aa94d0e32a7f5e1830ebc21 And I manually checked if it boots/works on the previous commit and the mentioned one. I guess the log also reveals warning in error path. Please see below. I didn't check any further. This is simple debian testing system with the following cmdline options: root@avadebian:~# cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-6.6-rc7+ ignore_loglevel root=/dev/nvme1n1p2 ro nr_cpus=32 So far simple revert (patch is below) returns things back to normal-ish: there are huge graphics artifacts on Xorg/X11 under 6.1 to upstream kernel. Wayland-based sway works great without issues. Not sure where should I report this. Please let me know if I can help debugging, testing or provide some other logs regarding 1cfb4d612127? Any cmdline options to collect more info? Thanks, Alexey >From 214372d5cedcf8757dd80d5f4d058377a3d92c52 Mon Sep 17 00:00:00 2001 From: Alexey Klimov <alexey.klimov@xxxxxxxxxx> Date: Thu, 26 Oct 2023 17:01:02 +0100 Subject: [PATCH] drm/amdgpu: Revert "drm/amdgpu: put MQDs in VRAM" This reverts commit 1cfb4d6121276a829aa94d0e32a7f5e1830ebc21. amdgpu driver fails during initialisation with RX7600/gfx11 on ADLINK Ampere Altra Developer Platform (AVA developer platform) with mentioned commit: [ 12.559893] [drm] Display Core v3.2.247 initialized on DCN 3.2.1 [ 12.565906] [drm] DP-HDMI FRL PCON supported [ 12.572192] [drm] DMUB hardware initialized: version=0x07000C00 [ 12.582541] snd_hda_intel 000d:03:00.1: bound 000d:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu]) [ 12.625357] [drm] kiq ring mec 3 pipe 1 q 0 [ 12.857087] amdgpu 000d:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.0 test failed (-110) [ 12.867930] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v11_0> failed -110 [ 12.877289] amdgpu 000d:03:00.0: amdgpu: amdgpu_device_ip_init failed [ 12.883723] amdgpu 000d:03:00.0: amdgpu: Fatal error during GPU init [ 12.890070] amdgpu 000d:03:00.0: amdgpu: amdgpu: finishing device. [ 12.896586] [drm] DSC precompute is not needed. [ 12.901142] ------------[ cut here ]------------ [ 12.905747] WARNING: CPU: 0 PID: 212 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:615 amdgpu_irq_put+0xa8/0xc8 [amdgpu] [ 12.916841] Modules linked in: hid_generic(E) usbhid(E) hid(E) qrtr(E) iptable_nat(E) amdgpu(E+) nf_nat(E) nf_conntrack(E) snd_hda_codec_hdmi(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) iptable_mangle(E) iptable_filter(E) amdxcp(E) drm_exec(E) gpu_sched(E) snd_hda_intel(E) aes_ce_blk(E) snd_intel_dspcfg(E) drm_buddy(E) aes_ce_cipher(E) snd_hda_codec(E) xhci_pci(E) video(E) crct10dif_ce(E) polyval_ce(E) snd_hda_core(E) xhci_hcd(E) drm_suballoc_helper(E) snd_hwdep(E) polyval_generic(E) drm_ttm_helper(E) snd_pcm(E) ghash_ce(E) ast(E) ttm(E) gf128mul(E) snd_timer(E) ipmi_ssif(E) drm_display_helper(E) drm_shmem_helper(E) sha2_ce(E) sha256_arm64(E) ipmi_devintf(E) usbcore(E) snd(E) drm_kms_helper(E) igb(E) sha1_ce(E) sbsa_gwdt(E) ipmi_msghandler(E) arm_spe_pmu(E) soundcore(E) usb_common(E) i2c_algo_bit(E) cppc_cpufreq(E) i2c_designware_platform(E) arm_dsu_pmu(E) arm_cmn(E) xgene_hwmon(E) i2c_designware_core(E) evdev(E) binfmt_misc(E) loop(E) fuse(E) efi_pstore(E) drm(E) dm_mod(E) dax(E) configfs(E) efivarfs(E) [ 12.916916] ip_tables(E) x_tables(E) autofs4(E) [ 13.011111] CPU: 0 PID: 212 Comm: kworker/0:2 Tainted: G E 6.6.0-rc7+ #23 [ 13.019277] Hardware name: ADLINK Ampere Altra Developer Platform/Ampere Altra Developer Platform, BIOS TianoCore 2.04.100.10 (SYS: 2.06.20220308) 04/18/2 [ 13.033084] Workqueue: events work_for_cpu_fn [ 13.037434] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.044384] pc : amdgpu_irq_put+0xa8/0xc8 [amdgpu] [ 13.049652] lr : amdgpu_fence_driver_hw_fini+0x118/0x160 [amdgpu] [ 13.056220] sp : ffff80008012bc10 [ 13.059522] x29: ffff80008012bc20 x28: 0000000000000000 x27: 0000000000000000 [ 13.066647] x26: 0000000000000000 x25: ffff07ff98580010 x24: ffff07ff98580000 [ 13.073772] x23: ffff07ff985a78f0 x22: ffff07ff98580010 x21: ffff07ff985904c8 [ 13.080896] x20: ffff07ff985900e8 x19: ffff07ff98598580 x18: 0000000000000006 [ 13.088020] x17: 0000000000000020 x16: ffffbb510d0d7140 x15: fffffffffffffefb [ 13.095145] x14: 0000000000000000 x13: 2e64656465656e20 x12: ffff07ff8c7fd9e0 [ 13.102268] x11: 00000000000003e8 x10: ffff07ff8c7fd9e0 x9 : ffffbb50ac3345e0 [ 13.109392] x8 : ffffbb50abf18000 x7 : 0000000000000000 x6 : 000000007a456104 [ 13.116516] x5 : 0000000000000000 x4 : ffff07ff98580000 x3 : 0000000000000000 [ 13.123641] x2 : 0000000000000000 x1 : ffff07ff985a78f0 x0 : ffff07ffc5fd4000 [ 13.130765] Call trace: [ 13.133200] amdgpu_irq_put+0xa8/0xc8 [amdgpu] [ 13.138121] amdgpu_device_fini_hw+0xb8/0x380 [amdgpu] [ 13.143732] amdgpu_driver_unload_kms+0x54/0x80 [amdgpu] [ 13.149517] amdgpu_driver_load_kms+0x100/0x1c0 [amdgpu] [ 13.155301] amdgpu_pci_probe+0x134/0x428 [amdgpu] [ 13.160564] local_pci_probe+0x48/0xb8 [ 13.164305] work_for_cpu_fn+0x24/0x40 [ 13.168043] process_one_work+0x170/0x3d0 [ 13.172042] worker_thread+0x2bc/0x3e0 [ 13.175781] kthread+0x118/0x128 [ 13.178999] ret_from_fork+0x10/0x20 [ 13.182564] ---[ end trace 0000000000000000 ]--- ... [ 16.984679] amdgpu: probe of 000d:03:00.0 failed with error -110 Cc: Luben Tuikov <luben.tuikov@xxxxxxx> Cc: Alex Deucher <alexander.deucher@xxxxxxx> Fixes: 1cfb4d612127 drm/amdgpu: put MQDs in VRAM Signed-off-by: Alexey Klimov <alexey.klimov@xxxxxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 9 ++------- drivers/gpu/drm/amd/amdgpu/mes_v10_1.c | 1 - drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 1 - 3 files changed, 2 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 2382921710ec..1f2d8be0fc44 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -382,11 +382,6 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device *adev, int r, i, j; struct amdgpu_kiq *kiq = &adev->gfx.kiq[xcc_id]; struct amdgpu_ring *ring = &kiq->ring; - u32 domain = AMDGPU_GEM_DOMAIN_GTT; - - /* Only enable on gfx10 and 11 for now to avoid changing behavior on older chips */ - if (adev->ip_versions[GC_HWIP][0] >= IP_VERSION(10, 0, 0)) - domain |= AMDGPU_GEM_DOMAIN_VRAM; /* create MQD for KIQ */ if (!adev->enable_mes_kiq && !ring->mqd_obj) { @@ -421,7 +416,7 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device *adev, ring = &adev->gfx.gfx_ring[i]; if (!ring->mqd_obj) { r = amdgpu_bo_create_kernel(adev, mqd_size, PAGE_SIZE, - domain, &ring->mqd_obj, + AMDGPU_GEM_DOMAIN_GTT, &ring->mqd_obj, &ring->mqd_gpu_addr, &ring->mqd_ptr); if (r) { dev_warn(adev->dev, "failed to create ring mqd bo (%d)", r); @@ -445,7 +440,7 @@ int amdgpu_gfx_mqd_sw_init(struct amdgpu_device *adev, ring = &adev->gfx.compute_ring[j]; if (!ring->mqd_obj) { r = amdgpu_bo_create_kernel(adev, mqd_size, PAGE_SIZE, - domain, &ring->mqd_obj, + AMDGPU_GEM_DOMAIN_GTT, &ring->mqd_obj, &ring->mqd_gpu_addr, &ring->mqd_ptr); if (r) { dev_warn(adev->dev, "failed to create ring mqd bo (%d)", r); diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c index eb06d749876f..080e7eb3f98d 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c @@ -898,7 +898,6 @@ static int mes_v10_1_mqd_sw_init(struct amdgpu_device *adev, return 0; r = amdgpu_bo_create_kernel(adev, mqd_size, PAGE_SIZE, - AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT, &ring->mqd_obj, &ring->mqd_gpu_addr, &ring->mqd_ptr); if (r) { diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index 6827d547042e..0608710306b8 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -1004,7 +1004,6 @@ static int mes_v11_0_mqd_sw_init(struct amdgpu_device *adev, return 0; r = amdgpu_bo_create_kernel(adev, mqd_size, PAGE_SIZE, - AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT, &ring->mqd_obj, &ring->mqd_gpu_addr, &ring->mqd_ptr); if (r) { -- 2.42.0