Re: [Bug][5.19-rc0] Between commits fdaf9a5840ac and babf0bb978e3 GPU stopped entering in graphic mode.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 07.07.22 um 02:20 schrieb Mikhail Gavrilov:
On Tue, Jun 28, 2022 at 2:21 PM Mikhail Gavrilov
<mikhail.v.gavrilov@xxxxxxxxx> wrote:
Christian can you look why
drm_aperture_remove_conflicting_pci_framebuffers cause this kernel bug
on my machine?

That looks like a problem outside of the amdgpu driver.

What happens is that during load amdgpu requests whatever driver (vesafb,vgafb or efifb) is currently handling the framebuffer to unload. This unload in turn now crashes for some reason.

My best suggestion is to try to bisect this.

Regards,
Christian.


[    6.822385] amdgpu: Ignoring ACPI CRAT on non-APU system
[    6.822462] amdgpu: Virtual CRAT table created for CPU
[    6.822654] amdgpu: Topology: Add CPU node
[    6.827643] Console: switching to colour dummy device 80x25
[    6.845504] BUG: kernel NULL pointer dereference, address: 0000000000000038
[    6.845509] #PF: supervisor read access in kernel mode
[    6.845512] #PF: error_code(0x0000) - not-present page
[    6.845515] PGD 0 P4D 0
[    6.845518] Oops: 0000 [#1] PREEMPT SMP NOPTI
[    6.845522] CPU: 27 PID: 612 Comm: systemd-udevd Tainted: G
W        --------  ---
5.19.0-0.rc5.20220705gitc1084b6c5620.40.fc37.x86_64 #1
[    6.845528] Hardware name: System manufacturer System Product
Name/ROG STRIX X570-I GAMING, BIOS 4403 04/27/2022
[    6.845533] RIP: 0010:kernfs_find_and_get_ns+0x11/0x70
[    6.845539] Code: 78 e8 c3 fa 31 00 48 85 c0 75 e1 eb 93 66 66 2e
0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 55 49 89 d5 41 54 49 89
f4 55 53 <48> 8b 47 38 48 89 fb 48 85 c0 48 0f 44 c7 48 8b a8 80 00 00
00 48
[    6.845546] RSP: 0018:ffffa98c022f3aa0 EFLAGS: 00010246
[    6.845550] RAX: 0000000000000000 RBX: ffffffffaf52c3c0 RCX: ffff9e150147b640
[    6.845553] RDX: 0000000000000000 RSI: ffffffffaf52c508 RDI: 0000000000000000
[    6.845557] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000249249d4
[    6.845560] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffaf52c508
[    6.845563] R13: 0000000000000000 R14: ffff9e157aa93900 R15: 0000000000000000
[    6.845567] FS:  00007fabaafbf680(0000) GS:ffff9e23e6a00000(0000)
knlGS:0000000000000000
[    6.845571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.845574] CR2: 0000000000000038 CR3: 000000017cb56000 CR4: 0000000000350ee0
[    6.845578] Call Trace:
[    6.845579]  <TASK>
[    6.845582]  sysfs_unmerge_group+0x18/0x60
[    6.845585]  dpm_sysfs_remove+0x20/0x60
[    6.845590]  device_del+0xa4/0x3f0
[    6.845594]  platform_device_del.part.0+0x13/0x70
[    6.845599]  platform_device_unregister+0x1c/0x30
[    6.845602]  sysfb_disable+0x2d/0x60
[    6.845605]  remove_conflicting_framebuffers+0x1b/0xc0
[    6.845610]  remove_conflicting_pci_framebuffers+0xce/0x120
[    6.845614]  drm_aperture_remove_conflicting_pci_framebuffers+0x57/0x80
[    6.845620]  amdgpu_pci_probe+0xcb/0x360 [amdgpu]
[    6.845760]  local_pci_probe+0x41/0x80
[    6.845764]  pci_device_probe+0xaa/0x210
[    6.845768]  really_probe+0x1bf/0x390
[    6.845771]  __driver_probe_device+0xfc/0x170
[    6.845775]  driver_probe_device+0x1f/0x90
[    6.845778]  __driver_attach+0xbf/0x1b0
[    6.845782]  ? __device_attach_driver+0xe0/0xe0
[    6.845785]  bus_for_each_dev+0x65/0x90
[    6.845789]  bus_add_driver+0x15c/0x200
[    6.845792]  driver_register+0x89/0xe0
[    6.845796]  ? 0xffffffffc0c8d000
[    6.845801]  do_one_initcall+0x69/0x350
[    6.845806]  ? rcu_read_lock_sched_held+0x3c/0x70
[    6.845810]  ? trace_kmalloc+0x3c/0x100
[    6.845814]  ? kmem_cache_alloc_trace+0x1e8/0x350
[    6.845818]  do_init_module+0x4a/0x200
[    6.845822]  __do_sys_init_module+0x13a/0x190
[    6.845827]  do_syscall_64+0x5b/0x80
[    6.845832]  ? asm_exc_page_fault+0x27/0x30
[    6.845835]  ? lockdep_hardirqs_on+0x7d/0x100
[    6.845839]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[    6.845842] RIP: 0033:0x7fababb7463e
[    6.845845] Code: 48 8b 0d e5 57 0c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b2 57 0c 00 f7 d8 64 89
01 48
[    6.845852] RSP: 002b:00007ffc6a6c9658 EFLAGS: 00000246 ORIG_RAX:
00000000000000af
[    6.845857] RAX: ffffffffffffffda RBX: 00005620deef53f0 RCX: 00007fababb7463e
[    6.845860] RDX: 00005620deeb2df0 RSI: 00000000010bfac6 RDI: 00007faba943e010
[    6.845864] RBP: 00005620deeb2df0 R08: 00005620deef4880 R09: 0000000000000000
[    6.845867] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
[    6.845870] R13: 00005620deeb5330 R14: 0000000000000000 R15: 00005620deef0410
[    6.845875]  </TASK>
[    6.845877] Modules linked in: amdgpu(+) drm_ttm_helper ttm
iommu_v2 crct10dif_pclmul gpu_sched crc32_pclmul crc32c_intel
drm_buddy drm_display_helper ucsi_ccg nvme igb typec_ucsi
ghash_clmulni_intel ccp cec typec sp5100_tco nvme_core dca wmi
ip6_tables ip_tables ipmi_devintf ipmi_msghandler fuse
[    6.845898] CR2: 0000000000000038
[    6.845900] ---[ end trace 0000000000000000 ]---


$ /usr/src/kernels/5.19.0-0.rc5.20220705gitc1084b6c5620.40.fc37.x86_64/scripts/faddr2line
/lib/debug/lib/modules/5.19.0-0.rc5.20220705gitc1084b6c5620.40.fc37.x86_64/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.debug
amdgpu_pci_probe+0xcb
amdgpu_pci_probe+0xcb/0x360:
amdgpu_pci_probe at
/usr/src/debug/kernel-5.19-rc5-49-gc1084b6c5620/linux-5.19.0-0.rc5.20220705gitc1084b6c5620.40.fc37.x86_64/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2061


$ cat -s -n /usr/src/debug/kernel-5.19-rc5-49-gc1084b6c5620/linux-5.19.0-0.rc5.20220705gitc1084b6c5620.40.fc37.x86_64/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
| head -2071 | tail -20
   2052 "Use radeon.cik_support=0 amdgpu.cik_support=1 to override.\n"
   2053 );
   2054 return -ENODEV;
   2055 }
   2056 }
   2057 #endif
   2058
   2059 /* Get rid of things like offb */
   2060 ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev,
&amdgpu_kms_driver);
   2061 if (ret)
   2062 return ret;
   2063
   2064 adev = devm_drm_dev_alloc(&pdev->dev, &amdgpu_kms_driver,
typeof(*adev), ddev);
   2065 if (IS_ERR(adev))
   2066 return PTR_ERR(adev);
   2067
   2068 adev->dev  = &pdev->dev;
   2069 adev->pdev = pdev;
   2070 ddev = adev_to_drm(adev);

$ git blame -L 2052,2070 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
Blaming lines: 100% (19/19), done.
984d7a929ad68 (Hans de Goede     2019-10-10 18:28:17 +0200 2052)
                  dev_info(&pdev->dev,
984d7a929ad68 (Hans de Goede     2019-10-10 18:28:17 +0200 2053)
                           "Use radeon.cik_support=0
amdgpu.cik_support=1 to override.\n"
984d7a929ad68 (Hans de Goede     2019-10-10 18:28:17 +0200 2054)
                          );
984d7a929ad68 (Hans de Goede     2019-10-10 18:28:17 +0200 2055)
                  return -ENODEV;
984d7a929ad68 (Hans de Goede     2019-10-10 18:28:17 +0200 2056)
          }
984d7a929ad68 (Hans de Goede     2019-10-10 18:28:17 +0200 2057)        }
984d7a929ad68 (Hans de Goede     2019-10-10 18:28:17 +0200 2058) #endif
984d7a929ad68 (Hans de Goede     2019-10-10 18:28:17 +0200 2059)
d38ceaf99ed01 (Alex Deucher      2015-04-20 16:55:21 -0400 2060)
  /* Get rid of things like offb */
97c9bfe3f6605 (Thomas Zimmermann 2021-06-29 15:58:33 +0200 2061)
  ret = drm_aperture_remove_conflicting_pci_framebuffers(pdev,
&amdgpu_kms_driver);
d38ceaf99ed01 (Alex Deucher      2015-04-20 16:55:21 -0400 2062)        if (ret)
d38ceaf99ed01 (Alex Deucher      2015-04-20 16:55:21 -0400 2063)
          return ret;
d38ceaf99ed01 (Alex Deucher      2015-04-20 16:55:21 -0400 2064)
5088d6572e8ff (Luben Tuikov      2020-11-04 11:04:25 +0100 2065)
  adev = devm_drm_dev_alloc(&pdev->dev, &amdgpu_kms_driver,
typeof(*adev), ddev);
df2ce4596c044 (Luben Tuikov      2020-09-18 15:25:04 +0200 2066)
  if (IS_ERR(adev))
df2ce4596c044 (Luben Tuikov      2020-09-18 15:25:04 +0200 2067)
          return PTR_ERR(adev);
8aba21b75136c (Luben Tuikov      2020-08-14 20:41:55 -0400 2068)
8aba21b75136c (Luben Tuikov      2020-08-14 20:41:55 -0400 2069)
  adev->dev  = &pdev->dev;
8aba21b75136c (Luben Tuikov      2020-08-14 20:41:55 -0400 2070)
  adev->pdev = pdev;

Thomas, you recently changed this line. Can you tell why we are
catching kernel Oops here?

Full kernel log (5.19-rc5): https://pastebin.com/5Ag804bd





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux