https://bugzilla.kernel.org/show_bug.cgi?id=216226 Bug ID: 216226 Summary: [amdgpu] BUG: kernel NULL pointer dereference Product: Drivers Version: 2.5 Kernel Version: 5.19.0-rc5-next-20220708 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@xxxxxxxxxxxxxxxxxxxx Reporter: spasswolf@xxxxxx Regression: No A very simple hsa test program triggers a NULL pointer dereference when opening /dev/kdf. Program: #include <stdlib.h> #include <stdio.h> #include <hsakmt.h> #include <hsakmttypes.h> int main() { HSAKMT_STATUS ret; ret = hsaKmtOpenKFD(); if (ret != HSAKMT_STATUS_SUCCESS) { printf("hsaKmtOpenKDF failed with status %d\n", ret); } /* If the NULL pointer dereference is triggered, it is triggered by hsaKmtOpenKFD */ ret = hsaKmtCloseKFD(); if (ret != HSAKMT_STATUS_SUCCESS) { printf("hsaKmtClosesKDF failed with status %d\n", ret); } return 0; } Version of libhsakmt: 5.2.0+dfsg-1 (from the debian sid repository) The test program does not trigger any error with linux-5.18.5. Error message: Jul 9 10:39:32 lisa kernel: [ 35.814297] ------------[ cut here ]------------ Jul 9 10:39:32 lisa kernel: [ 35.814298] WARNING: CPU: 8 PID: 126 at drivers/gpu/drm/ttm/ttm_bo.c:704 ttm_bo_unpin+0x5a/0x70 [ttm] Jul 9 10:39:32 lisa kernel: [ 35.814307] Modules linked in: ccm(EN) rfcomm(EN) bnep(EN) cpufreq_conservative(EN) cpufreq_ondemand(EN) cpufreq_powersave(EN) cpufreq_userspace(EN) snd_ctl_led(EN) btusb(EN) btrtl(EN) snd_hda_codec_realtek(EN) btbcm(EN) btintel(EN) btmtk(EN) snd_hda_codec_generic(EN) ledtrig_audio(EN) bluetooth(EN) snd_hda_codec_hdmi(EN) snd_hda_intel(EN) snd_soc_dmic(EN) snd_intel_dspcfg(EN) snd_acp3x_rn(EN) snd_acp3x_pdm_dma(EN) snd_hda_codec(EN) jitterentropy_rng(EN) snd_soc_core(EN) uvcvideo(EN) snd_hwdep(EN) snd_hda_core(EN) videobuf2_vmalloc(EN) snd_pcm_oss(EN) videobuf2_memops(EN) sha512_generic(EN) videobuf2_v4l2(EN) nls_ascii(EN) snd_mixer_oss(EN) nls_cp437(EN) videodev(EN) ccp(EN) snd_rn_pci_acp3x(EN) snd_pcm(EN) joydev(EN) snd_acp_config(EN) msi_wmi(EN) vfat(EN) snd_timer(EN) ctr(EN) drbg(EN) snd(EN) ecdh_generic(EN) evdev(EN) sparse_keymap(EN) wmi_bmof(EN) fat(EN) ecc(EN) videobuf2_common(EN) serio_raw(EN) hid_multitouch(EN) soundcore(EN) snd_soc_acpi(EN) efi_pstore(EN) rng_core(EN) Jul 9 10:39:32 lisa kernel: [ 35.814329] k10temp(EN) battery(EN) wmi(EN) ac(EN) button(EN) video(EN) hid_sensor_accel_3d(EN) hid_sensor_magn_3d(EN) hid_sensor_prox(EN) hid_sensor_als(EN) hid_sensor_gyro_3d(EN) hid_sensor_trigger(EN) industrialio_triggered_buffer(EN) kfifo_buf(EN) industrialio(EN) hid_sensor_iio_common(EN) amd_pmc(EN) acpi_cpufreq(EN) mt7921e(EN) mt7921_common(EN) mt76_connac_lib(EN) mt76(EN) mac80211(EN) libarc4(EN) cfg80211(EN) rfkill(EN) ipmi_devintf(EN) ipmi_msghandler(EN) msr(EN) fuse(EN) configfs(EN) efivarfs(EN) autofs4(EN) ext4(EN) crc32c_generic(EN) crc32c_intel(EN) crc16(EN) mbcache(EN) jbd2(EN) usbhid(EN) amdgpu(EN) drm_ttm_helper(EN) ttm(EN) gpu_sched(EN) i2c_algo_bit(EN) drm_buddy(EN) drm_display_helper(EN) nvme(EN) xhci_pci(EN) drm_kms_helper(EN) xhci_hcd(EN) nvme_core(EN) r8169(EN) syscopyarea(EN) sysfillrect(EN) hid_sensor_hub(EN) sysimgblt(EN) t10_pi(EN) mfd_core(EN) fb_sys_fops(EN) hid_generic(EN) drm(EN) usbcore(EN) realtek(EN) i2c_hid_acpi(EN) mdio_devres(EN) psmouse(EN) Jul 9 10:39:32 lisa kernel: [ 35.814352] i2c_hid(EN) amd_sfh(EN) libphy(EN) crc64_rocksoft(EN) hid(EN) backlight(EN) crc64(EN) crc_t10dif(EN) i2c_piix4(EN) cec(EN) usb_common(EN) crct10dif_generic(EN) crct10dif_common(EN) i2c_designware_platform(EN) i2c_designware_core(EN) Jul 9 10:39:32 lisa kernel: [ 35.814359] CPU: 8 PID: 126 Comm: kworker/8:1 Tainted: G W E N 5.19.0-rc5-next-20220708 #89 Jul 9 10:39:32 lisa kernel: [ 35.814361] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021 Jul 9 10:39:32 lisa kernel: [ 35.814362] Workqueue: kfd_process_wq kfd_process_wq_release [amdgpu] Jul 9 10:39:32 lisa kernel: [ 35.814479] RIP: 0010:ttm_bo_unpin+0x5a/0x70 [ttm] Jul 9 10:39:32 lisa kernel: [ 35.814485] Code: 00 00 83 ab 8c 01 00 00 01 48 85 ff 74 08 48 89 de e8 5a 44 00 00 48 8b bb 38 01 00 00 5b 48 81 c7 20 08 00 00 e9 f6 a6 18 c5 <0f> 0b 5b c3 0f 0b eb ac 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 Jul 9 10:39:32 lisa kernel: [ 35.814486] RSP: 0018:ffffb06b005cfd00 EFLAGS: 00010246 Jul 9 10:39:32 lisa kernel: [ 35.814487] RAX: 0000000000000000 RBX: ffff911c2410ac58 RCX: 0000000000000000 Jul 9 10:39:32 lisa kernel: [ 35.814488] RDX: ffff911bed7acc40 RSI: 0000000000000000 RDI: ffff911c2410ac58 Jul 9 10:39:32 lisa kernel: [ 35.814489] RBP: ffff911c2410ac00 R08: ffff911be0da5d68 R09: ffff911be0da5d68 Jul 9 10:39:32 lisa kernel: [ 35.814490] R10: ffff911bc7fb7140 R11: 0000000000000001 R12: ffff911be0da5128 Jul 9 10:39:32 lisa kernel: [ 35.814490] R13: ffff911c2410ac00 R14: ffff911bd83be800 R15: ffffd06aff804b05 Jul 9 10:39:32 lisa kernel: [ 35.814491] FS: 0000000000000000(0000) GS:ffff911e9e800000(0000) knlGS:0000000000000000 Jul 9 10:39:32 lisa kernel: [ 35.814492] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 9 10:39:32 lisa kernel: [ 35.814493] CR2: 000055704d312208 CR3: 000000012ce10000 CR4: 0000000000750ee0 Jul 9 10:39:32 lisa kernel: [ 35.814493] PKRU: 55555554 Jul 9 10:39:32 lisa kernel: [ 35.814494] Call Trace: Jul 9 10:39:32 lisa kernel: [ 35.814495] <TASK> Jul 9 10:39:32 lisa kernel: [ 35.814497] amdgpu_bo_unpin+0x15/0x80 [amdgpu] Jul 9 10:39:32 lisa kernel: [ 35.814586] amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x350/0x420 [amdgpu] Jul 9 10:39:32 lisa kernel: [ 35.814691] kfd_process_device_free_bos+0x98/0xe0 [amdgpu] Jul 9 10:39:32 lisa kernel: [ 35.814786] kfd_process_wq_release+0x27f/0x340 [amdgpu] Jul 9 10:39:32 lisa kernel: [ 35.814876] process_one_work+0x1bd/0x310 Jul 9 10:39:32 lisa kernel: [ 35.814880] ? rescuer_thread+0x390/0x390 Jul 9 10:39:32 lisa kernel: [ 35.814881] worker_thread+0x4b/0x390 Jul 9 10:39:32 lisa kernel: [ 35.814883] ? rescuer_thread+0x390/0x390 Jul 9 10:39:32 lisa kernel: [ 35.814884] kthread+0xd4/0x100 Jul 9 10:39:32 lisa kernel: [ 35.814886] ? kthread_complete_and_exit+0x20/0x20 Jul 9 10:39:32 lisa kernel: [ 35.814888] ret_from_fork+0x22/0x30 Jul 9 10:39:32 lisa kernel: [ 35.814891] </TASK> Jul 9 10:39:32 lisa kernel: [ 35.814891] ---[ end trace 0000000000000000 ]--- Jul 9 10:39:33 lisa kernel: [ 35.816088] BUG: kernel NULL pointer dereference, address: 0000000000000008 Jul 9 10:39:33 lisa kernel: [ 35.816091] #PF: supervisor read access in kernel mode Jul 9 10:39:33 lisa kernel: [ 35.816093] #PF: error_code(0x0000) - not-present page Jul 9 10:39:33 lisa kernel: [ 35.816095] PGD 0 P4D 0 Jul 9 10:39:33 lisa kernel: [ 35.816097] Oops: 0000 [#1] PREEMPT SMP NOPTI Jul 9 10:39:33 lisa kernel: [ 35.816100] CPU: 8 PID: 126 Comm: kworker/8:1 Tainted: G W E N 5.19.0-rc5-next-20220708 #89 Jul 9 10:39:33 lisa kernel: [ 35.816103] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021 Jul 9 10:39:33 lisa kernel: [ 35.816104] Workqueue: events delayed_fput Jul 9 10:39:33 lisa kernel: [ 35.816107] RIP: 0010:dma_resv_add_fence+0x3e/0x1a0 Jul 9 10:39:33 lisa kernel: [ 35.816112] Code: 89 54 24 04 48 85 f6 74 21 48 8d 7e 38 b8 01 00 00 00 f0 0f c1 46 38 85 c0 0f 84 0e 01 00 00 8d 50 01 09 c2 0f 88 12 01 00 00 <49> 8b 46 08 48 3d 80 0a ad 85 0f 84 ec 00 00 00 48 3d 20 0a ad 85 Jul 9 10:39:33 lisa kernel: [ 35.816114] RSP: 0018:ffffb06b005cfc98 EFLAGS: 00010246 Jul 9 10:39:33 lisa kernel: [ 35.816116] RAX: 0000000000000000 RBX: ffff911bd03a8158 RCX: 0000000080200013 Jul 9 10:39:33 lisa kernel: [ 35.816117] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff911bd03a8158 Jul 9 10:39:33 lisa kernel: [ 35.816119] RBP: ffff911bd03a8000 R08: 0000000000000000 R09: 0000000000000000 Jul 9 10:39:33 lisa kernel: [ 35.816120] R10: ffff911bc7916900 R11: 0000000000000000 R12: ffff911bd0499400 Jul 9 10:39:33 lisa kernel: [ 35.816121] R13: ffff911be09e5128 R14: 0000000000000000 R15: 00000000c0991f00 Jul 9 10:39:33 lisa kernel: [ 35.816122] FS: 0000000000000000(0000) GS:ffff911e9e800000(0000) knlGS:0000000000000000 Jul 9 10:39:33 lisa kernel: [ 35.816124] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 9 10:39:33 lisa kernel: [ 35.816125] CR2: 0000000000000008 CR3: 000000010409c000 CR4: 0000000000750ee0 Jul 9 10:39:33 lisa kernel: [ 35.816127] PKRU: 55555554 Jul 9 10:39:33 lisa kernel: [ 35.816128] Call Trace: Jul 9 10:39:33 lisa kernel: [ 35.816129] <TASK> Jul 9 10:39:33 lisa kernel: [ 35.816131] amdgpu_amdkfd_gpuvm_destroy_cb+0x53/0x1c0 [amdgpu] Jul 9 10:39:33 lisa kernel: [ 35.816223] amdgpu_vm_fini+0x39/0x4e0 [amdgpu] Jul 9 10:39:33 lisa kernel: [ 35.816295] ? amdgpu_ctx_mgr_entity_fini+0x4b/0xd0 [amdgpu] Jul 9 10:39:33 lisa kernel: [ 35.816366] amdgpu_driver_postclose_kms+0x1cb/0x2b0 [amdgpu] Jul 9 10:39:33 lisa kernel: [ 35.816433] drm_file_free.part.0+0x201/0x250 [drm] Jul 9 10:39:33 lisa kernel: [ 35.816448] drm_release+0x60/0x110 [drm] Jul 9 10:39:33 lisa kernel: [ 35.816459] __fput+0x87/0x240 Jul 9 10:39:33 lisa kernel: [ 35.816461] delayed_fput+0x1a/0x30 Jul 9 10:39:33 lisa kernel: [ 35.816462] process_one_work+0x1bd/0x310 Jul 9 10:39:33 lisa kernel: [ 35.816464] ? rescuer_thread+0x390/0x390 Jul 9 10:39:33 lisa kernel: [ 35.816465] worker_thread+0x4b/0x390 Jul 9 10:39:33 lisa kernel: [ 35.816466] ? rescuer_thread+0x390/0x390 Jul 9 10:39:33 lisa kernel: [ 35.816467] kthread+0xd4/0x100 Jul 9 10:39:33 lisa kernel: [ 35.816469] ? kthread_complete_and_exit+0x20/0x20 Jul 9 10:39:33 lisa kernel: [ 35.816471] ret_from_fork+0x22/0x30 Jul 9 10:39:33 lisa kernel: [ 35.816473] </TASK> Jul 9 10:39:33 lisa kernel: [ 35.816473] Modules linked in: ccm(EN) rfcomm(EN) bnep(EN) cpufreq_conservative(EN) cpufreq_ondemand(EN) cpufreq_powersave(EN) cpufreq_userspace(EN) snd_ctl_led(EN) btusb(EN) btrtl(EN) snd_hda_codec_realtek(EN) btbcm(EN) btintel(EN) btmtk(EN) snd_hda_codec_generic(EN) ledtrig_audio(EN) bluetooth(EN) snd_hda_codec_hdmi(EN) snd_hda_intel(EN) snd_soc_dmic(EN) snd_intel_dspcfg(EN) snd_acp3x_rn(EN) snd_acp3x_pdm_dma(EN) snd_hda_codec(EN) jitterentropy_rng(EN) snd_soc_core(EN) uvcvideo(EN) snd_hwdep(EN) snd_hda_core(EN) videobuf2_vmalloc(EN) snd_pcm_oss(EN) videobuf2_memops(EN) sha512_generic(EN) videobuf2_v4l2(EN) nls_ascii(EN) snd_mixer_oss(EN) nls_cp437(EN) videodev(EN) ccp(EN) snd_rn_pci_acp3x(EN) snd_pcm(EN) joydev(EN) snd_acp_config(EN) msi_wmi(EN) vfat(EN) snd_timer(EN) ctr(EN) drbg(EN) snd(EN) ecdh_generic(EN) evdev(EN) sparse_keymap(EN) wmi_bmof(EN) fat(EN) ecc(EN) videobuf2_common(EN) serio_raw(EN) hid_multitouch(EN) soundcore(EN) snd_soc_acpi(EN) efi_pstore(EN) rng_core(EN) Jul 9 10:39:33 lisa kernel: [ 35.816495] k10temp(EN) battery(EN) wmi(EN) ac(EN) button(EN) video(EN) hid_sensor_accel_3d(EN) hid_sensor_magn_3d(EN) hid_sensor_prox(EN) hid_sensor_als(EN) hid_sensor_gyro_3d(EN) hid_sensor_trigger(EN) industrialio_triggered_buffer(EN) kfifo_buf(EN) industrialio(EN) hid_sensor_iio_common(EN) amd_pmc(EN) acpi_cpufreq(EN) mt7921e(EN) mt7921_common(EN) mt76_connac_lib(EN) mt76(EN) mac80211(EN) libarc4(EN) cfg80211(EN) rfkill(EN) ipmi_devintf(EN) ipmi_msghandler(EN) msr(EN) fuse(EN) configfs(EN) efivarfs(EN) autofs4(EN) ext4(EN) crc32c_generic(EN) crc32c_intel(EN) crc16(EN) mbcache(EN) jbd2(EN) usbhid(EN) amdgpu(EN) drm_ttm_helper(EN) ttm(EN) gpu_sched(EN) i2c_algo_bit(EN) drm_buddy(EN) drm_display_helper(EN) nvme(EN) xhci_pci(EN) drm_kms_helper(EN) xhci_hcd(EN) nvme_core(EN) r8169(EN) syscopyarea(EN) sysfillrect(EN) hid_sensor_hub(EN) sysimgblt(EN) t10_pi(EN) mfd_core(EN) fb_sys_fops(EN) hid_generic(EN) drm(EN) usbcore(EN) realtek(EN) i2c_hid_acpi(EN) mdio_devres(EN) psmouse(EN) Jul 9 10:39:33 lisa kernel: [ 35.816516] i2c_hid(EN) amd_sfh(EN) libphy(EN) crc64_rocksoft(EN) hid(EN) backlight(EN) crc64(EN) crc_t10dif(EN) i2c_piix4(EN) cec(EN) usb_common(EN) crct10dif_generic(EN) crct10dif_common(EN) i2c_designware_platform(EN) i2c_designware_core(EN) Jul 9 10:39:33 lisa kernel: [ 35.816521] CR2: 0000000000000008 Jul 9 10:39:33 lisa kernel: [ 35.816523] ---[ end trace 0000000000000000 ]--- Jul 9 10:39:33 lisa kernel: [ 35.895575] RIP: 0010:dma_resv_add_fence+0x3e/0x1a0 Jul 9 10:39:33 lisa kernel: [ 35.895585] Code: 89 54 24 04 48 85 f6 74 21 48 8d 7e 38 b8 01 00 00 00 f0 0f c1 46 38 85 c0 0f 84 0e 01 00 00 8d 50 01 09 c2 0f 88 12 01 00 00 <49> 8b 46 08 48 3d 80 0a ad 85 0f 84 ec 00 00 00 48 3d 20 0a ad 85 Jul 9 10:39:33 lisa kernel: [ 35.895588] RSP: 0018:ffffb06b005cfc98 EFLAGS: 00010246 Jul 9 10:39:33 lisa kernel: [ 35.895591] RAX: 0000000000000000 RBX: ffff911bd03a8158 RCX: 0000000080200013 Jul 9 10:39:33 lisa kernel: [ 35.895592] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff911bd03a8158 Jul 9 10:39:33 lisa kernel: [ 35.895593] RBP: ffff911bd03a8000 R08: 0000000000000000 R09: 0000000000000000 Jul 9 10:39:33 lisa kernel: [ 35.895594] R10: ffff911bc7916900 R11: 0000000000000000 R12: ffff911bd0499400 Jul 9 10:39:33 lisa kernel: [ 35.895595] R13: ffff911be09e5128 R14: 0000000000000000 R15: 00000000c0991f00 Jul 9 10:39:33 lisa kernel: [ 35.895597] FS: 0000000000000000(0000) GS:ffff911e9e800000(0000) knlGS:0000000000000000 Jul 9 10:39:33 lisa kernel: [ 35.895598] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 9 10:39:33 lisa kernel: [ 35.895599] CR2: 0000000000000008 CR3: 000000010409c000 CR4: 0000000000750ee0 Jul 9 10:39:33 lisa kernel: [ 35.895600] PKRU: 55555554 Jul 9 10:39:38 lisa kernel: [ 41.995638] [drm] free PSP TMR buffer lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3) 02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3) 03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller 04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) 06:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD (rev 03) 07:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 500c (rev 01) 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c5) 08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller 08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor (rev 01) 08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller 08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub Bisection gives 548e7432dc2da475a18077b612e8d55b8ff51891 is the first bad commit commit 548e7432dc2da475a18077b612e8d55b8ff51891 Author: Christian König <christian.koenig@xxxxxxx> Date: Fri Sep 24 10:55:45 2021 +0200 dma-buf: add dma_resv_replace_fences v2 This function allows to replace fences from the shared fence list when we can gurantee that the operation represented by the original fence has finished or no accesses to the resources protected by the dma_resv object any more when the new fence finishes. Then use this function in the amdkfd code when BOs are unmapped from the process. v2: add an example when this is usefull. Signed-off-by: Christian König <christian.koenig@xxxxxxx> Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> Link: https://patchwork.freedesktop.org/patch/msgid/20220321135856.1331-1-christian.koenig@xxxxxxx drivers/dma-buf/dma-resv.c | 45 ++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 49 ++++-------------------- include/linux/dma-resv.h | 2 + 3 files changed, 54 insertions(+), 42 deletions(-) as the first commit that triggers the NULL pointer dereference. But even before this commit errors occur: Jul 9 13:07:50 lisa kernel: [ 38.775734] ------------[ cut here ]------------ Jul 9 13:07:50 lisa kernel: [ 38.775736] WARNING: CPU: 2 PID: 2509 at kernel/workqueue.c:3084 __flush_work.isra.0+0x209/0x220 Jul 9 13:07:50 lisa kernel: [ 38.775741] Modules linked in: ccm rfcomm bnep cpufreq_conservative cpufreq_ondemand cpufreq_powersave cpufreq_userspace snd_ctl_led btusb btrtl btbcm btintel btmtk bluetooth snd_hda_codec_realtek jitterentropy_rng snd_hda_codec_generic joydev ledtrig_audio snd_hda_codec_hdmi sha512_generic snd_hda_intel ctr snd_intel_dspcfg nls_ascii snd_soc_dmic snd_acp3x_pdm_dma snd_acp3x_rn snd_hda_codec uvcvideo drbg nls_cp437 snd_hwdep snd_soc_core videobuf2_vmalloc snd_hda_core videobuf2_memops vfat msi_wmi videobuf2_v4l2 ecdh_generic wmi_bmof sparse_keymap videodev ecc snd_pcm_oss fat videobuf2_common snd_mixer_oss snd_pcm snd_timer evdev ccp snd soundcore snd_rn_pci_acp3x serio_raw rng_core efi_pstore hid_multitouch k10temp hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_als hid_sensor_accel_3d wmi hid_sensor_prox hid_sensor_trigger battery button ac video industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common acpi_cpufreq amd_pmc mt7921e mt7921_common mt76_connac_lib mt76 Jul 9 13:07:50 lisa kernel: [ 38.775774] mac80211 libarc4 cfg80211 rfkill ipmi_devintf ipmi_msghandler msr fuse configfs efivarfs autofs4 ext4 crc32c_generic crc32c_intel crc16 mbcache jbd2 usbhid amdgpu drm_ttm_helper ttm gpu_sched i2c_algo_bit drm_dp_helper drm_kms_helper syscopyarea hid_sensor_hub sysfillrect xhci_pci sysimgblt mfd_core fb_sys_fops xhci_hcd hid_generic r8169 drm nvme realtek usbcore nvme_core i2c_hid_acpi mdio_devres i2c_hid psmouse t10_pi amd_sfh crc_t10dif libphy backlight hid i2c_piix4 crct10dif_generic cec usb_common crct10dif_common i2c_designware_platform i2c_designware_core Jul 9 13:07:50 lisa kernel: [ 38.775796] CPU: 2 PID: 2509 Comm: very_simple_tes Tainted: G W 5.17.0-rc2-00359-g701920ca9822 #104 Jul 9 13:07:50 lisa kernel: [ 38.775798] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.107 11/10/2021 Jul 9 13:07:50 lisa kernel: [ 38.775799] RIP: 0010:__flush_work.isra.0+0x209/0x220 Jul 9 13:07:50 lisa kernel: [ 38.775801] Code: 8b 4d 00 4c 8b 45 08 89 ca 48 c1 e9 04 83 e2 08 83 e1 0f 83 ca 02 89 c8 48 0f ba 6d 00 03 e9 29 ff ff ff 0f 0b e9 52 ff ff ff <0f> 0b 45 31 ed e9 48 ff ff ff e8 38 c9 69 00 0f 1f 84 00 00 00 00 Jul 9 13:07:50 lisa kernel: [ 38.775802] RSP: 0018:ffffc071c100bc68 EFLAGS: 00010246 Jul 9 13:07:50 lisa kernel: [ 38.775804] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 Jul 9 13:07:50 lisa kernel: [ 38.775804] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffa0f6a3179728 Jul 9 13:07:50 lisa kernel: [ 38.775805] RBP: ffffa0f6a3179728 R08: 0000000000000000 R09: 0000000000000000 Jul 9 13:07:50 lisa kernel: [ 38.775806] R10: 0000000000000000 R11: 0000000000000002 R12: ffffa0f6a3179728 Jul 9 13:07:50 lisa kernel: [ 38.775806] R13: 0000000000000001 R14: ffffa0f6400ffc78 R15: ffffa0f7403d0798 Jul 9 13:07:50 lisa kernel: [ 38.775807] FS: 0000000000000000(0000) GS:ffffa0f91e680000(0000) knlGS:0000000000000000 Jul 9 13:07:50 lisa kernel: [ 38.775808] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 9 13:07:50 lisa kernel: [ 38.775809] CR2: 00007fcd5c070000 CR3: 00000001016f0000 CR4: 0000000000750ee0 Jul 9 13:07:50 lisa kernel: [ 38.775809] PKRU: 55555554 Jul 9 13:07:50 lisa kernel: [ 38.775810] Call Trace: Jul 9 13:07:50 lisa kernel: [ 38.775811] <TASK> Jul 9 13:07:50 lisa kernel: [ 38.775813] ? wait_for_completion+0xa0/0xe0 Jul 9 13:07:50 lisa kernel: [ 38.775816] __cancel_work_timer+0xfa/0x180 Jul 9 13:07:50 lisa kernel: [ 38.775818] kfd_process_notifier_release+0x86/0x150 [amdgpu] Jul 9 13:07:50 lisa kernel: [ 38.775943] __mmu_notifier_release+0x6e/0x200 Jul 9 13:07:50 lisa kernel: [ 38.775946] exit_mmap+0x191/0x1c0 Jul 9 13:07:50 lisa kernel: [ 38.775948] ? futex_cleanup+0xa9/0x440 Jul 9 13:07:50 lisa kernel: [ 38.775952] mmput+0x49/0x130 Jul 9 13:07:50 lisa kernel: [ 38.775954] do_exit+0x2b0/0xa20 Jul 9 13:07:50 lisa kernel: [ 38.775957] do_group_exit+0x28/0x90 Jul 9 13:07:50 lisa kernel: [ 38.775959] __x64_sys_exit_group+0xf/0x10 Jul 9 13:07:50 lisa kernel: [ 38.775960] do_syscall_64+0x3b/0x90 Jul 9 13:07:50 lisa kernel: [ 38.775962] entry_SYSCALL_64_after_hwframe+0x44/0xae Jul 9 13:07:50 lisa kernel: [ 38.775964] RIP: 0033:0x7fcd5bf10f49 Jul 9 13:07:50 lisa kernel: [ 38.775966] Code: Unable to access opcode bytes at RIP 0x7fcd5bf10f1f. Jul 9 13:07:50 lisa kernel: [ 38.775966] RSP: 002b:00007fff55a37058 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 Jul 9 13:07:50 lisa kernel: [ 38.775968] RAX: ffffffffffffffda RBX: 00007fcd5c014920 RCX: 00007fcd5bf10f49 Jul 9 13:07:50 lisa kernel: [ 38.775968] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 Jul 9 13:07:50 lisa kernel: [ 38.775969] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 00007fcd5ceeb950 Jul 9 13:07:50 lisa kernel: [ 38.775969] R10: 0000000000000005 R11: 0000000000000246 R12: 00007fcd5c014920 Jul 9 13:07:50 lisa kernel: [ 38.775970] R13: 0000000000000001 R14: 00007fcd5c019e28 R15: 0000000000000000 Jul 9 13:07:50 lisa kernel: [ 38.775971] </TASK> Jul 9 13:07:50 lisa kernel: [ 38.775972] ---[ end trace 0000000000000000 ]--- these are non fatal though while the NULL pointer dereference errors either lead to an immediate hang or and hang on rebooting. Bisection for this error is currently under way. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.