On 2/6/22 09:17, Tomasz Moń wrote:
On Mon, Jul 12, 2021 at 11:56 AM Tomasz Moń <desowin@xxxxxxxxx> wrote:
I am having trouble getting Linux to run on MacBook Pro 2019 with
Radeon Pro Vega 20 4 GB. Basically as soon as graphical user interface
starts, the whole system freezes. This happens with every Linux kernel
version I have tried over the last few months, including 5.13.
It is significantly better on 5.17-rc2. That is, the whole system is
not frozen. just the screen keeps blinking and visual artifacts show.
Graphical desktop is not usable, but switching between virtual
terminals works just fine.
I have tried on amd-drm-next-5.18-2022-03-25 with following options:
amdgpu.vm_debug=1 amdgpu.debug_evictions=1 amdgpu.dcdebugmask=0xffffffff amdgpu.dc=1 amdgpu.dcdebugmask=0xffffffff
Unfortunately I am not familiar with the domain so my understanding is
very limited. However the UNLOAD_TA command returning 0x117 stands out.
What does the status 0x117 from UNLOAD_TA mean? Is the documentation for
commands publicly available?
[ 24.931035] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[ 24.931035] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[ 29.847661] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1712, emitted seq=1713
[ 29.847970] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 572 thread Xorg:cs0 pid 573
[ 29.848244] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[ 32.746329] audit: type=1131 audit(1648314775.912:67): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 32.847653] audit: type=1334 audit(1648314776.015:68): prog-id=0 op=UNLOAD
[ 32.847658] audit: type=1334 audit(1648314776.015:69): prog-id=0 op=UNLOAD
[ 32.847660] audit: type=1334 audit(1648314776.015:70): prog-id=0 op=UNLOAD
[ 33.848255] amdgpu 0000:03:00.0: amdgpu: failed to suspend display audio
[ 33.848260] ------------[ cut here ]------------
[ 33.848261] Evicting all processes
[ 33.848276] WARNING: CPU: 10 PID: 469 at drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:1888 kfd_suspend_all_processes+0xfa/0x110 [amdgpu]
[ 33.848554] Modules linked in: amdgpu drm_ttm_helper gpu_sched
[ 33.848558] CPU: 10 PID: 469 Comm: kworker/u32:4 Not tainted 5.17.0-rc6-amd #1 771afb710c57d59790c0f2362731ed3ffe6af1f8
[ 33.848561] Hardware name: Apple Inc. MacBookPro15,3/Mac-1E7E29AD0135F9BC, BIOS 1731.100.130.0.0 (iBridge: 19.16.14242.0.0,0) 02/15/2022
[ 33.848563] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[ 33.848568] RIP: 0010:kfd_suspend_all_processes+0xfa/0x110 [amdgpu]
[ 33.848817] Code: c7 c7 40 9b 85 c0 41 5c 41 5d e9 c1 e8 98 f1 be 03 00 00 00 e8 27 16 e3 f1 e9 5b ff ff ff 48 c7 c7 4a cc 6f c0 e8 6d 2c 8c f2 <0f> 0b e9 26 ff ff ff 0f 0b eb c5 66 66 2e 0f 1f 84 00 00 00 00 00
[ 33.848819] RSP: 0018:ffffade780a83d08 EFLAGS: 00010246
[ 33.848821] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 33.848823] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 33.848824] RBP: ffff8d1fc4d68000 R08: 0000000000000000 R09: 0000000000000000
[ 33.848825] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d1fd79e0000
[ 33.848826] R13: 0000000000000000 R14: ffff8d1fc2a270d0 R15: ffff8d1fd79e0000
[ 33.848828] FS: 0000000000000000(0000) GS:ffff8d271ec80000(0000) knlGS:0000000000000000
[ 33.848830] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 33.848831] CR2: 00007fc180bfb200 CR3: 00000004a8c10003 CR4: 00000000003706e0
[ 33.848833] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 33.848834] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 33.848835] Call Trace:
[ 33.848837] <TASK>
[ 33.848839] kgd2kfd_suspend.part.0+0x3d/0x40 [amdgpu 9abbe9b6fc2429e6d465345bc384def6ac94e6a9]
[ 33.849084] kgd2kfd_pre_reset+0x43/0x60 [amdgpu 9abbe9b6fc2429e6d465345bc384def6ac94e6a9]
[ 33.849326] amdgpu_device_gpu_recover_imp.cold+0x120/0x8e9 [amdgpu 9abbe9b6fc2429e6d465345bc384def6ac94e6a9]
[ 33.849628] amdgpu_job_timedout+0x18f/0x1c0 [amdgpu 9abbe9b6fc2429e6d465345bc384def6ac94e6a9]
[ 33.849887] ? finish_task_switch.isra.0+0xaa/0x290
[ 33.849892] drm_sched_job_timedout+0x77/0x120 [gpu_sched 721b514943d9cddbec8b63d5dd19fd642806bd31]
[ 33.849898] process_one_work+0x1e2/0x3b0
[ 33.849901] ? rescuer_thread+0x3a0/0x3a0
[ 33.849903] worker_thread+0x50/0x3a0
[ 33.849905] ? rescuer_thread+0x3a0/0x3a0
[ 33.849906] kthread+0xd6/0x100
[ 33.849910] ? kthread_complete_and_exit+0x20/0x20
[ 33.849913] ret_from_fork+0x1f/0x30
[ 33.849918] </TASK>
[ 33.849919] ---[ end trace 0000000000000000 ]---
[ 34.072819] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
[ 34.072824] [drm] free PSP TMR buffer
[ 34.108151] CPU: 12 PID: 469 Comm: kworker/u32:4 Tainted: G W 5.17.0-rc6-amd #1 771afb710c57d59790c0f2362731ed3ffe6af1f8
[ 34.108156] Hardware name: Apple Inc. MacBookPro15,3/Mac-1E7E29AD0135F9BC, BIOS 1731.100.130.0.0 (iBridge: 19.16.14242.0.0,0) 02/15/2022
[ 34.108158] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[ 34.108165] Call Trace:
[ 34.108167] <TASK>
[ 34.108168] dump_stack_lvl+0x48/0x66
[ 34.108174] amdgpu_do_asic_reset+0x28/0x45c [amdgpu 9abbe9b6fc2429e6d465345bc384def6ac94e6a9]
[ 34.108523] amdgpu_device_gpu_recover_imp.cold+0x60e/0x8e9 [amdgpu 9abbe9b6fc2429e6d465345bc384def6ac94e6a9]
[ 34.108835] amdgpu_job_timedout+0x18f/0x1c0 [amdgpu 9abbe9b6fc2429e6d465345bc384def6ac94e6a9]
[ 34.109109] ? finish_task_switch.isra.0+0xaa/0x290
[ 34.109114] drm_sched_job_timedout+0x77/0x120 [gpu_sched 721b514943d9cddbec8b63d5dd19fd642806bd31]
[ 34.109120] process_one_work+0x1e2/0x3b0
[ 34.109123] ? rescuer_thread+0x3a0/0x3a0
[ 34.109125] worker_thread+0x50/0x3a0
[ 34.109127] ? rescuer_thread+0x3a0/0x3a0
[ 34.109129] kthread+0xd6/0x100
[ 34.109132] ? kthread_complete_and_exit+0x20/0x20
[ 34.109135] ret_from_fork+0x1f/0x30
[ 34.109141] </TASK>
[ 34.109144] amdgpu 0000:03:00.0: amdgpu: MODE1 reset
[ 34.109146] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
[ 34.109196] amdgpu 0000:03:00.0: amdgpu: GPU psp mode1 reset
[ 34.637593] [drm] psp mode1 reset succeed
[ 34.708779] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 34.708884] [drm] PCIE GART of 512M enabled.
[ 34.708886] [drm] PTB located at 0x000000F400000000
[ 34.708906] [drm] VRAM is lost due to GPU reset!
[ 34.708907] [drm] PSP is resuming...
[ 34.896778] [drm] reserve 0x400000 from 0xf4fec00000 for PSP TMR
[ 36.604868] [drm] kiq ring mec 2 pipe 1 q 0
[ 36.626895] [drm] UVD and UVD ENC initialized successfully.
[ 36.727142] [drm] VCE initialized successfully.
[ 36.727148] amdgpu 0000:03:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[ 36.727151] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 36.727153] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 36.727154] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 36.727155] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 36.727157] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 36.727158] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 36.727159] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 36.727160] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 36.727162] amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 36.727163] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
[ 36.727164] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 1 on hub 1
[ 36.727166] amdgpu 0000:03:00.0: amdgpu: ring uvd_0 uses VM inv eng 4 on hub 1
[ 36.727167] amdgpu 0000:03:00.0: amdgpu: ring uvd_enc_0.0 uses VM inv eng 5 on hub 1
[ 36.727168] amdgpu 0000:03:00.0: amdgpu: ring uvd_enc_0.1 uses VM inv eng 6 on hub 1
[ 36.727169] amdgpu 0000:03:00.0: amdgpu: ring vce0 uses VM inv eng 7 on hub 1
[ 36.727170] amdgpu 0000:03:00.0: amdgpu: ring vce1 uses VM inv eng 8 on hub 1
[ 36.727171] amdgpu 0000:03:00.0: amdgpu: ring vce2 uses VM inv eng 9 on hub 1
[ 36.729758] amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow start
[ 36.729762] amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow done
[ 36.729765] [drm] Skip scheduling IBs!
[ 36.729785] amdgpu 0000:03:00.0: amdgpu: GPU reset(2) succeeded!
[ 36.729811] [drm] Skip scheduling IBs!
...
[ 36.731540] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!