Re: [PATCH] drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 09.01.23 um 14:13 schrieb Mikhail Gavrilov:
On Fri, Jan 6, 2023 at 8:27 PM Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:

And it looks like Dmitry submitted it initially to the wrong branch.

Because of this it wasn't scheduled as fix for 6.2, but rather queued up
as new feature for 6.3.

This is fixed by now and the patch should show up in the next -rc.

Regards,
Christian.

Hi,
Not sure related to this patch but I caught kernel oops this weekend.
Reproducing is too hard. I don't know which actions need to be taken.
but I'm definitely sure that this is happening when I launch
"Cyberpunk 2077", Google Chrome with a huge amount of opened windows
and tabs should be launched too.
But even two described conditions is not enough.
In a way that is not entirely clear to me, a memory leak should occur.

That looks like an out of memory situation is not gracefully handled.

In other words we have a missing NULL check in drm_sched_job_cleanup().

Going to take a look.

Thanks,
Christian.


The trace looks like:
BUG: kernel NULL pointer dereference, address: 0000000000000078
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 39818f067 P4D 39818f067 PUD 35bbd6067 PMD 4f8438067 PTE 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 21 PID: 100830 Comm: GameThread Tainted: G        W    L
-------  ---  6.2.0-0.rc2.20230105git41c03ba9beea.20.fc38.x86_64 #1
Hardware name: System manufacturer System Product Name/ROG STRIX
X570-I GAMING, BIOS 4408 10/28/2022
RIP: 0010:drm_sched_job_cleanup+0x1a/0x110 [gpu_sched]
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00
55 53 48 89 fb 48 83 ec 08 48 8b 7f 20 48 c7 04 24 00 00 00 00 <8b> 47
78 85 c0 0f 84 b5 00 00 00 48 83 ff c0 74 1f 48 8d 57 78 b8
RSP: 0018:ffffae3e16c0b9d0 EFLAGS: 00010282
RAX: 0000000000000001 RBX: ffff91de6f7bc000 RCX: 00000000012a8976
RDX: 0000000000000000 RSI: ffffffffadbda69b RDI: 0000000000000000
RBP: ffff91de6f7bc000 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffffff
R13: 0000000000000018 R14: ffff91e259275000 R15: 0000000000000001
FS:  000000007bcff6c0(0000) GS:ffff91e667e00000(0000) knlGS:000000007abe0000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000078 CR3: 0000000297a24000 CR4: 0000000000350ee0
Call Trace:
  <TASK>
  amdgpu_job_free+0x1d/0x120 [amdgpu]
  amdgpu_cs_parser_fini+0x119/0x170 [amdgpu]
  amdgpu_cs_ioctl+0x3f4/0x2000 [amdgpu]
  ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
  drm_ioctl_kernel+0xac/0x160
  drm_ioctl+0x1e7/0x450
  ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
  __x64_sys_ioctl+0x90/0xd0
  do_syscall_64+0x5b/0x80
  ? do_syscall_64+0x67/0x80
  ? lock_is_held_type+0xe8/0x140
  ? asm_sysvec_call_function+0x16/0x20
  ? lockdep_hardirqs_on+0x7d/0x100
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fe30905e65f
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48
89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2
3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:000000007bcfd410 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000007bcfd738 RCX: 00007fe30905e65f
RDX: 000000007bcfd520 RSI: 00000000c0186444 RDI: 00000000000000b6
RBP: 000000007bcfd520 R08: 00007fe2800a6b80 R09: 000000007bcfd4b0
R10: 000000007e22b350 R11: 0000000000000246 R12: 00000000c0186444
R13: 00000000000000b6 R14: 000000000000000d R15: 00007fe2800a6ab0
  </TASK>
Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer netconsole
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep
sunrpc binfmt_misc mt76x2u mt76x2_common mt76x02_usb mt76_usb iwlmvm
mt76x02_lib mt76 mac80211 btusb iwlwifi libarc4 btrtl btbcm btintel
btmtk hid_logitech_hidpp xpad bluetooth cfg80211 ff_memless joydev
intel_rapl_msr intel_rapl_common edac_mce_amd eeepc_wmi
snd_hda_codec_realtek kvm_amd asus_wmi snd_hda_codec_generic
snd_seq_midi snd_seq_midi_event ledtrig_audio vfat asus_ec_sensors kvm
sparse_keymap platform_profile snd_hda_codec_hdmi fat snd_usb_audio
snd_hda_intel snd_intel_dspcfg snd_usbmidi_lib snd_intel_sdw_acpi
irqbypass snd_rawmidi snd_hda_codec rapl rfkill mc snd_hda_core
wmi_bmof pcspkr i2c_piix4 k10temp snd_hwdep snd_seq snd_seq_device
[19447.812785]  snd_pcm acpi_cpufreq hid_logitech_dj snd_timer snd
soundcore zram amdgpu drm_ttm_helper ttm video crct10dif_pclmul
iommu_v2 crc32_pclmul crc32c_intel drm_buddy polyval_clmulni gpu_sched
polyval_generic igb drm_display_helper nvme ucsi_ccg typec_ucsi
ghash_clmulni_intel ccp typec sha512_ssse3 nvme_core cec sp5100_tco
dca nvme_common wmi ip6_tables ip_tables fuse
CR2: 0000000000000078
---[ end trace 0000000000000000 ]---
RIP: 0010:drm_sched_job_cleanup+0x1a/0x110 [gpu_sched]
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00
55 53 48 89 fb 48 83 ec 08 48 8b 7f 20 48 c7 04 24 00 00 00 00 <8b> 47
78 85 c0 0f 84 b5 00 00 00 48 83 ff c0 74 1f 48 8d 57 78 b8
RSP: 0018:ffffae3e16c0b9d0 EFLAGS: 00010282
RAX: 0000000000000001 RBX: ffff91de6f7bc000 RCX: 00000000012a8976
RDX: 0000000000000000 RSI: ffffffffadbda69b RDI: 0000000000000000
RBP: ffff91de6f7bc000 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000ffffffff
R13: 0000000000000018 R14: ffff91e259275000 R15: 0000000000000001
FS:  000000007bcff6c0(0000) GS:ffff91e667e00000(0000) knlGS:000000007abe0000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000078 CR3: 0000000297a24000 CR4: 0000000000350ee0






[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux