Re: amdgpu didn't start with pci=nocrs parameter, get error "Fatal error during GPU init"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 24.02.23 um 17:21 schrieb Mikhail Gavrilov:
On Fri, Feb 24, 2023 at 8:31 PM Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
Sorry I totally missed that you attached the full dmesg to your original
mail.

Yeah, the driver did fail gracefully. But then X doesn't come up and
then gdm just dies.
Are you sure that these messages should be present when the driver
fails gracefully?

Unfortunately yes. We could clean that up a bit more so that you don't run into a BUG() assertion, but what essentially happens here is that we completely fail to talk to the hardware.

In this situation we can't even re-enable vesa or text console any more.

Regards,
Christian.


turning off the locking correctness validator.
CPU: 14 PID: 470 Comm: (udev-worker) Tainted: G             L
-------  ---  6.3.0-0.rc0.20230222git5b7c4cabbb65.3.fc39.x86_64+debug
#1
Hardware name: ASUSTeK COMPUTER INC. ROG Strix G513QY_G513QY/G513QY,
BIOS G513QY.320 09/07/2022
Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x90
  register_lock_class+0x47d/0x490
  __lock_acquire+0x74/0x21f0
  ? lock_release+0x155/0x450
  lock_acquire+0xd2/0x320
  ? amdgpu_irq_disable_all+0x37/0xf0 [amdgpu]
  ? lock_is_held_type+0xce/0x120
  _raw_spin_lock_irqsave+0x4d/0xa0
  ? amdgpu_irq_disable_all+0x37/0xf0 [amdgpu]
  amdgpu_irq_disable_all+0x37/0xf0 [amdgpu]
  amdgpu_device_fini_hw+0x43/0x2c0 [amdgpu]
  amdgpu_driver_load_kms+0xe8/0x190 [amdgpu]
  amdgpu_pci_probe+0x140/0x420 [amdgpu]
  local_pci_probe+0x41/0x90
  pci_device_probe+0xc3/0x230
  really_probe+0x1b6/0x410
  __driver_probe_device+0x78/0x170
  driver_probe_device+0x1f/0x90
  __driver_attach+0xd2/0x1c0
  ? __pfx___driver_attach+0x10/0x10
  bus_for_each_dev+0x8a/0xd0
  bus_add_driver+0x141/0x230
  driver_register+0x77/0x120
  ? __pfx_init_module+0x10/0x10 [amdgpu]
  do_one_initcall+0x6e/0x350
  do_init_module+0x4a/0x220
  __do_sys_init_module+0x192/0x1c0
  do_syscall_64+0x5b/0x80
  ? asm_exc_page_fault+0x22/0x30
  ? lockdep_hardirqs_on+0x7d/0x100
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fd58cfcb1be
Code: 48 8b 0d 4d 0c 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 1a 0c 0c 00 f7 d8 64 89 01
RSP: 002b:00007ffd1d1065d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055b0b5aa6d70 RCX: 00007fd58cfcb1be
RDX: 000055b0b5a96670 RSI: 00000000016b6156 RDI: 00007fd589392010
RBP: 00007ffd1d106690 R08: 000055b0b5a93bd0 R09: 00000000016b6ff0
R10: 000055b5eea2c333 R11: 0000000000000246 R12: 000055b0b5a96670
R13: 0000000000020000 R14: 000055b0b5a9c170 R15: 000055b0b5aa58a0
  </TASK>
amdgpu: probe of 0000:03:00.0 failed with error -12
amdgpu 0000:08:00.0: enabling device (0006 -> 0007)
[drm] initializing kernel modesetting (RENOIR 0x1002:0x1638 0x1043:0x16C2 0xC4).


list_add corruption. prev->next should be next (ffffffffc0940328), but
was 0000000000000000. (prev=ffff8c9b734062b0).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:30!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 14 PID: 470 Comm: (udev-worker) Tainted: G             L
-------  ---  6.3.0-0.rc0.20230222git5b7c4cabbb65.3.fc39.x86_64+debug
#1
Hardware name: ASUSTeK COMPUTER INC. ROG Strix G513QY_G513QY/G513QY,
BIOS G513QY.320 09/07/2022
RIP: 0010:__list_add_valid+0x74/0x90
Code: 8d ff 0f 0b 48 89 c1 48 c7 c7 a0 3d b3 99 e8 a3 ed 8d ff 0f 0b
48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 f8 3d b3 99 e8 8c ed 8d ff <0f> 0b
48 89 f2 48 89 c1 48 89 fe 48 c7 c7 50 3e b3 99 e8 75 ed 8d
RSP: 0018:ffffa50f81aafa00 EFLAGS: 00010246
RAX: 0000000000000075 RBX: ffff8c9b734062b0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
RBP: ffff8c9b734062b0 R08: 0000000000000000 R09: ffffa50f81aaf8a0
R10: 0000000000000003 R11: ffff8caa1d2fffe8 R12: ffff8c9b7c0a5e48
R13: 0000000000000000 R14: ffffffffc13a6d20 R15: 0000000000000000
FS:  00007fd58c6a5940(0000) GS:ffff8ca9d9a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b0b5a955e0 CR3: 000000017e860000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
  <TASK>
  ttm_device_init+0x184/0x1c0 [ttm]
  amdgpu_ttm_init+0xb8/0x610 [amdgpu]
  ? _printk+0x60/0x80
  gmc_v9_0_sw_init+0x4a3/0x7c0 [amdgpu]
  amdgpu_device_init+0x14e5/0x2520 [amdgpu]
  amdgpu_driver_load_kms+0x15/0x190 [amdgpu]
  amdgpu_pci_probe+0x140/0x420 [amdgpu]
  local_pci_probe+0x41/0x90
  pci_device_probe+0xc3/0x230
  really_probe+0x1b6/0x410
  __driver_probe_device+0x78/0x170
  driver_probe_device+0x1f/0x90
  __driver_attach+0xd2/0x1c0
  ? __pfx___driver_attach+0x10/0x10
  bus_for_each_dev+0x8a/0xd0
  bus_add_driver+0x141/0x230
  driver_register+0x77/0x120
  ? __pfx_init_module+0x10/0x10 [amdgpu]
  do_one_initcall+0x6e/0x350
  do_init_module+0x4a/0x220
  __do_sys_init_module+0x192/0x1c0
  do_syscall_64+0x5b/0x80
  ? asm_exc_page_fault+0x22/0x30
  ? lockdep_hardirqs_on+0x7d/0x100
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fd58cfcb1be
Code: 48 8b 0d 4d 0c 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f
84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 1a 0c 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd1d1065d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055b0b5aa6d70 RCX: 00007fd58cfcb1be
RDX: 000055b0b5a96670 RSI: 00000000016b6156 RDI: 00007fd589392010
RBP: 00007ffd1d106690 R08: 000055b0b5a93bd0 R09: 00000000016b6ff0
R10: 000055b5eea2c333 R11: 0000000000000246 R12: 000055b0b5a96670
R13: 0000000000020000 R14: 000055b0b5a9c170 R15: 000055b0b5aa58a0
  </TASK>
Modules linked in: amdgpu(+) drm_ttm_helper hid_asus ttm asus_wmi
iommu_v2 crct10dif_pclmul ledtrig_audio drm_buddy crc32_pclmul
sparse_keymap gpu_sched crc32c_intel polyval_clmulni platform_profile
hid_multitouch polyval_generic drm_display_helper nvme rfkill
ucsi_acpi ghash_clmulni_intel nvme_core typec_ucsi serio_raw
sp5100_tco ccp sha512_ssse3 r8169 cec typec nvme_common i2c_hid_acpi
video i2c_hid wmi ip6_tables ip_tables fuse
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_add_valid+0x74/0x90
Code: 8d ff 0f 0b 48 89 c1 48 c7 c7 a0 3d b3 99 e8 a3 ed 8d ff 0f 0b
48 89 d1 48 89 c6 4c 89 c2 48 c7 c7 f8 3d b3 99 e8 8c ed 8d ff <0f> 0b
48 89 f2 48 89 c1 48 89 fe 48 c7 c7 50 3e b3 99 e8 75 ed 8d
RSP: 0018:ffffa50f81aafa00 EFLAGS: 00010246
RAX: 0000000000000075 RBX: ffff8c9b734062b0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
RBP: ffff8c9b734062b0 R08: 0000000000000000 R09: ffffa50f81aaf8a0
R10: 0000000000000003 R11: ffff8caa1d2fffe8 R12: ffff8c9b7c0a5e48
R13: 0000000000000000 R14: ffffffffc13a6d20 R15: 0000000000000000
FS:  00007fd58c6a5940(0000) GS:ffff8ca9d9a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b0b5a955e0 CR3: 000000017e860000 CR4: 0000000000750ee0
PKRU: 55555554
(udev-worker) (470) used greatest stack depth: 12416 bytes left

I thought that gracefully means switching to svga mode and showing the
desktop with software rendering (exactly as it happens when I
blacklist amdgpu driver). Currently the boot process stucking and the
local console is unavailable.






[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux