On Mon, Dec 16, 2024 at 01:36:29PM -0500, Alex Deucher wrote: > On Fri, Dec 13, 2024 at 7:53 AM Chris Rankin <rankincj@xxxxxxxxx> wrote: > > > > Hi, > > > > I've just noticed this warning in my dmesg log. This is a vanilla > > 6.12.4 kernel, with a Radeon RX6600 graphics card. > > That was caused by this commit: > > commit 746ae46c11137ba21f0c0c68f082a9d8c1222c78 > Author: Matthew Brost <matthew.brost@xxxxxxxxx> > Date: Wed Oct 23 16:59:17 2024 -0700 > > drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM > > drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the path > of dma-fences, and dma-fences are in the path of reclaim. Mark scheduler > work queue with WQ_MEM_RECLAIM to ensure forward progress during > reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward > progress during reclaim. > > However, after further discussion, I think the warning is actually a > false positive. See this discussion: > https://lists.freedesktop.org/archives/amd-gfx/2024-November/117349.html > > From the thread: > "Question is - does check_flush_dependency() need to skip the > !WQ_MEM_RECLAIM flushing WQ_MEM_RECLAIM warning *if* the work is already > running *and* it was called from cancel_delayed_work_sync()?" > See my reply just now [1] — I’m going to have to disagree with AMD's assessment, but I’m not certain. Again, I believe Tejun is the authority here. Matt [1] https://lore.kernel.org/all/154641d9-be2a-4018-af5e-a57dbffb45d5@xxxxxxxxxx/T/#ma1ed4a99d9ad1a05f8d4648dd979d7c9d93591ff > Thanks, > > Alex > > > > > > Cheers, > > Chris > > > > [ 4624.741148] ------------[ cut here ]------------ > > [ 4624.744474] workqueue: WQ_MEM_RECLAIM sdma0:drm_sched_run_job_work > > [gpu_sched] is flushing !WQ_MEM_RECLAIM > > events:amdgpu_device_delay_enable_gfx_off [amdgpu] > > [ 4624.744942] WARNING: CPU: 2 PID: 9069 at kernel/workqueue.c:3704 > > check_flush_dependency+0xbe/0xd0 > > [ 4624.765285] Modules linked in: snd_seq_dummy rpcrdma rdma_cm iw_cm > > ib_cm ib_core af_packet nf_conntrack_netbios_ns nf_conntrack_broadcast > > nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet > > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat > > ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw > > ip6table_security iptable_nat iptable_mangle iptable_raw > > iptable_security nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack > > nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c ebtable_filter > > ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables > > bnep it87 hwmon_vid binfmt_misc snd_hda_codec_realtek > > snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_scodec_component > > snd_hda_intel uvcvideo btusb uvc videobuf2_vmalloc btintel > > videobuf2_memops videobuf2_v4l2 videodev btbcm snd_usb_audio bluetooth > > snd_intel_dspcfg intel_powerclamp snd_hda_codec videobuf2_common > > coretemp snd_virtuoso snd_usbmidi_lib snd_oxygen_lib snd_ctl_led > > kvm_intel input_leds mc snd_hwdep led_class snd_mpu401_uart > > [ 4624.765400] snd_hda_core joydev snd_rawmidi rfkill kvm snd_seq > > snd_seq_device gpio_ich snd_pcm pktcdvd iTCO_wdt snd_hrtimer r8169 > > snd_timer intel_cstate realtek snd mdio_devres intel_uncore libphy > > i2c_i801 soundcore lpc_ich tiny_power_button mxm_wmi i7core_edac > > acpi_cpufreq i2c_smbus pcspkr button nfsd auth_rpcgss nfs_acl lockd > > grace dm_mod fuse sunrpc loop configfs dax nfnetlink zram zsmalloc > > ext4 crc32c_generic mbcache jbd2 amdgpu video amdxcp i2c_algo_bit > > mfd_core drm_ttm_helper ttm drm_exec gpu_sched hid_microsoft > > drm_suballoc_helper drm_buddy drm_display_helper drm_kms_helper usbhid > > sr_mod sd_mod cdrom drm pata_jmicron ahci libahci uhci_hcd xhci_pci > > libata ehci_pci ehci_hcd xhci_hcd scsi_mod firewire_ohci psmouse > > firewire_core usbcore crc32c_intel sha512_ssse3 sha256_ssse3 bsg > > serio_raw sha1_ssse3 drm_panel_orientation_quirks scsi_common crc16 > > usb_common crc_itu_t wmi msr gf128mul crypto_simd cryptd > > [ 4624.932496] CPU: 2 UID: 0 PID: 9069 Comm: kworker/u32:3 Tainted: G > > I 6.12.4 #1 > > [ 4624.939803] Tainted: [I]=FIRMWARE_WORKAROUND > > [ 4624.942773] Hardware name: Gigabyte Technology Co., Ltd. > > EX58-UD3R/EX58-UD3R, BIOS FB 05/04/2009 > > [ 4624.950340] Workqueue: sdma0 drm_sched_run_job_work [gpu_sched] > > [ 4624.954967] RIP: 0010:check_flush_dependency+0xbe/0xd0 > > [ 4624.958806] Code: 75 2a 48 8b 55 18 48 8d 8b c8 00 00 00 4d 89 e0 > > 48 81 c6 c8 00 00 00 48 c7 c7 1b d6 e9 81 c6 05 a3 5f 56 01 01 e8 32 > > 30 fe ff <0f> 0b 5b 5d 41 5c c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 > > 90 90 > > [ 4624.976253] RSP: 0018:ffffc9000bec7c88 EFLAGS: 00010086 > > [ 4624.980177] RAX: 0000000000000000 RBX: ffff888100118000 RCX: 0000000000000027 > > [ 4624.986003] RDX: 0000000000000003 RSI: ffffffff81eab2b9 RDI: 00000000ffffffff > > [ 4624.991835] RBP: ffff888155daa900 R08: 0000000000000000 R09: 7067646d61006600 > > [ 4624.997668] R10: 0000000000000091 R11: fefefefefefefeff R12: ffffffffa05ec880 > > [ 4625.003501] R13: 0000000000000001 R14: ffff88810011c600 R15: ffff888163600000 > > [ 4625.009334] FS: 0000000000000000(0000) GS:ffff888343c80000(0000) > > knlGS:0000000000000000 > > [ 4625.016118] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 4625.020555] CR2: 0000000099837000 CR3: 0000000144e4c000 CR4: 00000000000026f0 > > [ 4625.026381] Call Trace: > > [ 4625.027525] <TASK> > > [ 4625.028323] ? __warn+0x90/0x120 > > [ 4625.030255] ? report_bug+0xe2/0x160 > > [ 4625.032532] ? check_flush_dependency+0xbe/0xd0 > > [ 4625.035768] ? handle_bug+0x53/0x80 > > [ 4625.037959] ? exc_invalid_op+0x13/0x60 > > [ 4625.040499] ? asm_exc_invalid_op+0x16/0x20 > > [ 4625.043384] ? __pfx_amdgpu_device_delay_enable_gfx_off+0x10/0x10 [amdgpu] > > [ 4625.049366] ? check_flush_dependency+0xbe/0xd0 > > [ 4625.052598] ? check_flush_dependency+0xbe/0xd0 > > [ 4625.055830] __flush_work+0xb2/0x1f0 > > [ 4625.058109] ? work_grab_pending+0x3f/0x120 > > [ 4625.060996] ? set_work_pool_and_clear_pending+0x14/0x20 > > [ 4625.065008] ? __cancel_work+0x89/0xc0 > > [ 4625.067460] __cancel_work_sync+0x4a/0x70 > > [ 4625.070173] amdgpu_gfx_off_ctrl+0xa6/0x100 [amdgpu] > > [ 4625.074231] amdgpu_ring_alloc+0x52/0x60 [amdgpu] > > [ 4625.077974] amdgpu_ib_schedule+0x155/0x640 [amdgpu] > > [ 4625.081988] amdgpu_job_run+0xda/0x140 [amdgpu] > > [ 4625.085663] drm_sched_run_job_work+0x246/0x310 [gpu_sched] > > [ 4625.089935] process_scheduled_works+0x19c/0x2c0 > > [ 4625.093252] worker_thread+0x13b/0x1c0 > > [ 4625.095706] ? __pfx_worker_thread+0x10/0x10 > > [ 4625.098678] kthread+0xef/0x100 > > [ 4625.100523] ? __pfx_kthread+0x10/0x10 > > [ 4625.102976] ret_from_fork+0x24/0x40 > > [ 4625.105256] ? __pfx_kthread+0x10/0x10 > > [ 4625.107709] ret_from_fork_asm+0x1a/0x30 > > [ 4625.110338] </TASK> > > [ 4625.111228] ---[ end trace 0000000000000000 ]---