Re: Linux 5.17.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

no, right at the first cold boot with the patched kernel the warning appeared:

May  2 21:50:27 xxx kernel: WARNING: CPU: 0 PID: 1 at
drivers/iommu/amd/init.c:851 amd_iommu_enable_interrupts+0x312/0x3f0
May  2 21:50:27 xxx kernel: Modules linked in:
May  2 21:50:27 xxx kernel: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.5 #2
May  2 21:50:27 xxx kernel: Hardware name: Micro-Star International Co., Ltd.
MS-7C94/MAG B550M MORTAR (MS-7C94), BIOS 1.94 09/23/2021
May  2 21:50:27 xxx kernel: RIP: 0010:amd_iommu_enable_interrupts+0x312/0x3f0
May  2 21:50:27 xxx kernel: Code: ff ff 49 8b 7f 18 89 04 24 e8 2a ff f6 ff 8b
04 24 e9 7b fd ff ff 0f 0b 4d 8b 3f 49 81 ff 90 15 4c 9f 0f 85 35 fd ff ff eb 82
<0f> 0b 4d 8b 3f 49 81 ff 90 15 4c 9f 0f 85 21 fd ff ff e9 6b ff ff
May  2 21:50:27 xxx kernel: RSP: 0018:ffffb9ad4005fdd8 EFLAGS: 00010246
May  2 21:50:27 xxx kernel: RAX: 00000015be386e7c RBX: 0000000000000000 RCX:
0000000000000000
May  2 21:50:27 xxx kernel: RDX: 0000000000009e16 RSI: 0000000000009427 RDI:
00000015be37d066
May  2 21:50:27 xxx kernel: RBP: 0000000080000000 R08: ffffffffffffffff R09:
0000000000000000
May  2 21:50:27 xxx kernel: R10: 00000000000000d1 R11: 0000000000000000 R12:
000ffffffffffff8
May  2 21:50:27 xxx kernel: R13: 0800000000000000 R14: 0008000000000000 R15:
ffff9a4600190000
May  2 21:50:27 xxx kernel: FS:  0000000000000000(0000)
GS:ffff9a53f1e00000(0000) knlGS:0000000000000000
May  2 21:50:27 xxx kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  2 21:50:27 xxx kernel: CR2: ffff9a51c9c01000 CR3: 0000000cc960a000 CR4:
0000000000750ef0
May  2 21:50:27 xxx kernel: PKRU: 55555554
May  2 21:50:27 xxx kernel: Call Trace:
May  2 21:50:27 xxx kernel: <TASK>
May  2 21:50:27 xxx kernel: iommu_go_to_state+0x10e0/0x138d
May  2 21:50:27 xxx kernel: ? e820__memblock_setup+0x78/0x78
May  2 21:50:27 xxx kernel: amd_iommu_init+0xa/0x20
May  2 21:50:27 xxx kernel: pci_iommu_init+0x11/0x3a
May  2 21:50:27 xxx kernel: do_one_initcall+0x47/0x180
May  2 21:50:27 xxx kernel: kernel_init_freeable+0x162/0x1a7
May  2 21:50:27 xxx kernel: ? rest_init+0xc0/0xc0
May  2 21:50:27 xxx kernel: kernel_init+0x11/0x110
May  2 21:50:27 xxx kernel: ret_from_fork+0x22/0x30
May  2 21:50:27 xxx kernel: </TASK>

For a cold boot I switch off the computer for ca. 30 seconds and switch it on
again. I booted into a console where I looked out for warnings with `dmesg -l
warn`. Then I tried to start X with `startx` but the screen got blocked. Via ssh
I ordered `reboot`, a warm start. Then the warning didn't appear, I could start
X and work normally.

In 'kern.log' I also found this:

May  2 21:53:27 xxx kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=16, emitted seq=17
May  2 21:53:27 xxx kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
information: process Xorg pid 1787 thread Xorg:cs0 pid 1788
May  2 21:53:27 xxx kernel: amdgpu 0000:30:00.0: amdgpu: GPU reset begin!
May  2 21:53:27 xxx kernel: amdgpu 0000:30:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
May  2 21:53:27 xxx kernel: [drm] free PSP TMR buffer
May  2 21:53:27 xxx kernel: amdgpu 0000:30:00.0: amdgpu: MODE2 reset
May  2 21:53:27 xxx kernel: amdgpu 0000:30:00.0: amdgpu: GPU reset succeeded,
trying to resume
May  2 21:53:27 xxx kernel: [drm] PCIE GART of 1024M enabled.
May  2 21:53:27 xxx kernel: [drm] PTB located at 0x000000F400900000
May  2 21:53:27 xxx kernel: [drm] PSP is resuming...
May  2 21:53:27 xxx kernel: [drm] reserve 0x400000 from 0xf4ff800000 for PSP TMR
May  2 21:53:27 xxx kernel: amdgpu 0000:30:00.0: amdgpu: RAS: optional ras ta
ucode is not available
May  2 21:53:27 xxx kernel: amdgpu 0000:30:00.0: amdgpu: RAP: optional rap ta
ucode is not available
May  2 21:53:27 xxx kernel: amdgpu 0000:30:00.0: amdgpu: SECUREDISPLAY:
securedisplay ta ucode is not available
May  2 21:53:27 xxx kernel: amdgpu 0000:30:00.0: amdgpu: SMU is resuming...
May  2 21:53:27 xxx kernel: amdgpu 0000:30:00.0: amdgpu: SMU is resumed
successfully!
May  2 21:53:27 xxx kernel: [drm] DMUB hardware initialized: version=0x0101001F
May  2 21:53:28 xxx kernel: [drm] kiq ring mec 2 pipe 1 q 0
May  2 21:53:28 xxx kernel: amdgpu 0000:30:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
May  2 21:53:28 xxx kernel: [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR*
KCQ enable failed
May  2 21:53:28 xxx kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]]
*ERROR* resume of IP block <gfx_v9_0> failed -110
May  2 21:53:28 xxx kernel: amdgpu 0000:30:00.0: amdgpu: GPU reset(2) failed
May  2 21:53:28 xxx kernel: amdgpu 0000:30:00.0: amdgpu: GPU reset end with ret
= -110
May  2 21:53:38 xxx kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=17, emitted seq=17
May  2 21:53:38 xxx kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
information: process Xorg pid 1787 thread Xorg:cs0 pid 1788
May  2 21:53:38 xxx kernel: amdgpu 0000:30:00.0: amdgpu: GPU reset begin!

Thanks for your help.
Regards,
Jörg.

JoergRoedel wrote on 02/05/2022 11:45:
[now with Vasants correct email address]

Hi Jörg,

can you please try the attached patch? It should get rid of the WARNING
on your system.

Suravee, Vasant, can you please test review the patch and report whether
the GA log functionality is still working?

Thanks,

	Joerg

 From 4fee768d5c23715eae31fed3b41cdf045e099aef Mon Sep 17 00:00:00 2001
From: Joerg Roedel <jroedel@xxxxxxx>
Date: Mon, 2 May 2022 11:37:43 +0200
Subject: [PATCH] iommu/amd: Do not poll GA_LOG_RUNNING mask at boot

On some hardware it takes more than a second for the hardware to get
the GA log into running state. This is too long to poll for in the AMD
IOMMU driver code.

Instead, check whehter initialization was successful before polling
the log for the first time.

Signed-off-by: Joerg Roedel <jroedel@xxxxxxx>
---
  drivers/iommu/amd/amd_iommu_types.h |  3 +++
  drivers/iommu/amd/init.c            | 13 ++-----------
  drivers/iommu/amd/iommu.c           | 25 ++++++++++++++++++++++++-
  3 files changed, 29 insertions(+), 12 deletions(-)
<snip>




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux