Re: Amdgpu kernel oops and freezing on system suspend and hibernate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alex,

thanks for the hint, but...

Is this patch intended for kernel 5.11.8?

I applied the patch against 5.11.8 and it is freezing again:


Mär 23 16:18:51 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! Mär 23 16:18:51 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! Mär 23 16:18:51 obelix kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=615, emitted seq=617 Mär 23 16:18:51 obelix kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Mär 23 16:18:51 obelix kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Mär 23 16:18:51 obelix kernel: BUG: kernel NULL pointer dereference, address: 0000000000000029
Mär 23 16:18:51 obelix kernel: #PF: supervisor read access in kernel mode
Mär 23 16:18:51 obelix kernel: #PF: error_code(0x0000) - not-present page
Mär 23 16:18:51 obelix kernel: PGD 0 P4D 0
Mär 23 16:18:51 obelix kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mär 23 16:18:51 obelix kernel: CPU: 12 PID: 178 Comm: kworker/12:1 Not tainted 5.11.8-arch1-1-custom #1 Mär 23 16:18:51 obelix kernel: Hardware name: Micro-Star International Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS E17FKAMS.117 10/29/2020 Mär 23 16:18:51 obelix kernel: Workqueue: events drm_sched_job_timedout [gpu_sched] Mär 23 16:18:51 obelix kernel: RIP: 0010:kernel_queue_uninit+0xd/0xf0 [amdgpu] Mär 23 16:18:51 obelix kernel: Code: ee 48 89 c7 e8 a4 f9 ff ff 84 c0 0f 84 e3 d3 1f 00 4c 89 e0 5d 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 8b 47 10 48 89 fd <8b> 50 28 83 fa 02 74 78 83 fa 03 0f 84 b1 00 00 00 48 8b 7f 08 4c
Mär 23 16:18:51 obelix kernel: RSP: 0018:ffffa35d806dfd40 EFLAGS: 00010246
Mär 23 16:18:51 obelix kernel: RAX: 0000000000000001 RBX: ffff8b044c5ee000 RCX: 000000000080005b Mär 23 16:18:51 obelix kernel: RDX: 000000000080005c RSI: 0000000000000001 RDI: ffff8b044a877bc0 Mär 23 16:18:51 obelix kernel: RBP: ffff8b044a877bc0 R08: 0000000000000001 R09: 0000000000000000 Mär 23 16:18:51 obelix kernel: R10: 0000000000000000 R11: ffffffffafccba00 R12: ffff8b044c5ee0d0 Mär 23 16:18:51 obelix kernel: R13: ffff8b044bf60000 R14: ffff8b04414a1000 R15: ffff8b04414a10c8 Mär 23 16:18:51 obelix kernel: FS: 0000000000000000(0000) GS:ffff8b075f900000(0000) knlGS:0000000000000000 Mär 23 16:18:51 obelix kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mär 23 16:18:51 obelix kernel: CR2: 0000000000000029 CR3: 00000001ab010000 CR4: 0000000000350ee0
Mär 23 16:18:51 obelix kernel: Call Trace:
Mär 23 16:18:51 obelix kernel:  stop_cpsch+0xa0/0xc0 [amdgpu]
Mär 23 16:18:51 obelix kernel:  kgd2kfd_suspend.part.0+0x2f/0x40 [amdgpu]
Mär 23 16:18:51 obelix kernel:  kgd2kfd_pre_reset+0x3f/0x50 [amdgpu]
Mär 23 16:18:51 obelix kernel: amdgpu_device_gpu_recover.cold+0x36e/0x95d [amdgpu]
Mär 23 16:18:51 obelix kernel:  amdgpu_job_timedout+0x121/0x140 [amdgpu]
Mär 23 16:18:51 obelix kernel:  drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
Mär 23 16:18:51 obelix kernel:  process_one_work+0x214/0x3e0
Mär 23 16:18:51 obelix kernel:  worker_thread+0x4d/0x3d0
Mär 23 16:18:51 obelix kernel:  ? rescuer_thread+0x3c0/0x3c0
Mär 23 16:18:51 obelix kernel:  kthread+0x133/0x150
Mär 23 16:18:51 obelix kernel:  ? __kthread_bind_mask+0x60/0x60
Mär 23 16:18:51 obelix kernel:  ret_from_fork+0x22/0x30
Mär 23 16:18:51 obelix kernel: Modules linked in: rfcomm snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi cmac algif_hash snd_hda_intel algif_skcipher snd_intel_dspcfg soundwire_intel af_alg soundwire_ge> Mär 23 16:18:51 obelix kernel: sr_mod cdrom uas usb_storage dm_crypt cbc encrypted_keys dm_mod trusted tpm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper serio_raw ccp xhc>
Mär 23 16:18:51 obelix kernel: CR2: 0000000000000029
Mär 23 16:18:51 obelix kernel: ---[ end trace 8a72c5e07cbe6b63 ]---
Mär 23 16:18:51 obelix kernel: RIP: 0010:kernel_queue_uninit+0xd/0xf0 [amdgpu] Mär 23 16:18:51 obelix kernel: Code: ee 48 89 c7 e8 a4 f9 ff ff 84 c0 0f 84 e3 d3 1f 00 4c 89 e0 5d 41 5c 41 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 8b 47 10 48 89 fd <8b> 50 28 83 fa 02 74 78 83 fa 03 0f 84 b1 00 00 00 48 8b 7f 08 4c
Mär 23 16:18:51 obelix kernel: RSP: 0018:ffffa35d806dfd40 EFLAGS: 00010246
Mär 23 16:18:51 obelix kernel: RAX: 0000000000000001 RBX: ffff8b044c5ee000 RCX: 000000000080005b Mär 23 16:18:51 obelix kernel: RDX: 000000000080005c RSI: 0000000000000001 RDI: ffff8b044a877bc0 Mär 23 16:18:51 obelix kernel: RBP: ffff8b044a877bc0 R08: 0000000000000001 R09: 0000000000000000 Mär 23 16:18:51 obelix kernel: R10: 0000000000000000 R11: ffffffffafccba00 R12: ffff8b044c5ee0d0 Mär 23 16:18:51 obelix kernel: R13: ffff8b044bf60000 R14: ffff8b04414a1000 R15: ffff8b04414a10c8 Mär 23 16:18:51 obelix kernel: FS: 0000000000000000(0000) GS:ffff8b075f900000(0000) knlGS:0000000000000000 Mär 23 16:18:51 obelix kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mär 23 16:18:51 obelix kernel: CR2: 0000000000000029 CR3: 0000000105594000 CR4: 0000000000350ee0 Mär 23 16:19:10 obelix systemd[1]: systemd-hostnamed.service: Deactivated successfully. Mär 23 16:19:10 obelix audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Mär 23 16:19:10 obelix kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!

Greetings
Harvey

Am 22.03.21 um 20:22 schrieb Alex Deucher:
On Thu, Mar 18, 2021 at 8:19 AM Harvey <harv@xxxxxx> wrote:

Alex,

I waited for kernel 5.11.7 to hit our repos yesterday evening and tested
again:

1. The suspend issue is gone - suspend and resume now work as expected.

2. System hibernation seems to be a different beast - still freezing

You need this patch:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/711c13547aad08f2cfe996e0cddc3d56f1233081

Alex
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


--
I am root. If you see me laughing, you'd better have a backup!

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux