On 2022/03/17 23:38, Dan Schatzberg wrote: > On Thu, Mar 17, 2022 at 11:08:04PM +0900, Tetsuo Handa wrote: >> Commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker") >> again started using per device workqueue >> >> - lo->worker_task = kthread_run(loop_kthread_worker_fn, >> - &lo->worker, "loop%d", lo->lo_number); >> + lo->workqueue = alloc_workqueue("loop%d", >> + WQ_UNBOUND | WQ_FREEZABLE, >> + 0, >> + lo->lo_number); >> >> but forgot to restore WQ_MEM_RECLAIM flag. > > Early versions of the patch did have WQ_MEM_RECLAIM but it was > removed. I looked up my notes and found this: > > Changes since V11: > > * Removed WQ_MEM_RECLAIM flag from loop workqueue. Technically, this > can be driven by writeback, but this was causing a warning in xfs > and likely other filesystems aren't equipped to be driven by reclaim > at the VFS layer. > > I don't know if this is still the case. Can you test it? Well, indeed there was WQ_MEM_RECLAIM flag in https://lore.kernel.org/all/20210316153655.500806-2-schatzberg.dan@xxxxxxxxx/ and the warning you are referring to seems to be https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ . ---------------------------------------- [ 49.143055] run fstests generic/361 at 2021-03-19 23:56:49 [ 49.418726] loop: module loaded [ 49.642111] XFS (sda4): Mounting V5 Filesystem [ 49.761139] XFS (sda4): Ending clean mount [ 49.768167] xfs filesystem being mounted at /fs/scratch supports timestamps until 2038 (0x7fffffff) [ 49.799002] loop0: detected capacity change from 0 to 2097152 [ 50.027339] XFS (loop0): Mounting V5 Filesystem [ 50.033830] XFS (loop0): Ending clean mount [ 50.038232] xfs filesystem being mounted at /fs/scratch/mnt supports timestamps until 2038 (0x7fffffff) [ 50.239610] XFS: attr2 mount option is deprecated. [ 50.423584] ------------[ cut here ]------------ [ 50.428278] workqueue: WQ_MEM_RECLAIM loop0:loop_rootcg_workfn [loop] is flushing !WQ_MEM_RECLAIM xfs-sync/sda4:xfs_flush_inodes_worker [xfs] [ 50.428387] WARNING: CPU: 0 PID: 35 at kernel/workqueue.c:2613 check_flush_dependency+0x116/0x140 [ 50.450013] Modules linked in: loop xfs dm_mod btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c sd_mod t10_pi sg ipmi_devintf ipmi_msghandler intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal i915 intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul hp_wmi sparse_keymap intel_gtt crc32c_intel ghash_clmulni_intel mei_wdt rfkill wmi_bmof rapl drm_kms_helper ahci intel_cstate syscopyarea mei_me libahci sysfillrect sysimgblt fb_sys_fops intel_uncore serio_raw mei drm libata intel_pch_thermal ie31200_edac wmi video tpm_infineon intel_pmc_core acpi_pad ip_tables [ 50.500731] CPU: 0 PID: 35 Comm: kworker/u8:3 Not tainted 5.12.0-rc2-00093-geaba74271070 #1 [ 50.509081] Hardware name: HP HP Z238 Microtower Workstation/8183, BIOS N51 Ver. 01.63 10/05/2017 [ 50.517963] Workqueue: loop0 loop_rootcg_workfn [loop] [ 50.523109] RIP: 0010:check_flush_dependency+0x116/0x140 [ 50.528427] Code: 8d 8b b0 00 00 00 48 8b 50 18 49 89 e8 48 8d b1 b0 00 00 00 48 c7 c7 80 ed 34 82 4c 89 c9 c6 05 8a 11 ab 01 01 e8 29 9a a9 00 <0f> 0b e9 07 ff ff ff 80 3d 78 11 ab 01 00 75 92 e9 3b ff ff ff 66 [ 50.547207] RSP: 0018:ffffc9000018fba0 EFLAGS: 00010086 [ 50.552434] RAX: 0000000000000000 RBX: ffff88843a39b600 RCX: 0000000000000027 [ 50.559567] RDX: 0000000000000027 RSI: 00000000ffff7fff RDI: ffff88843f4177f8 [ 50.566713] RBP: ffffffffc091c760 R08: ffff88843f4177f0 R09: ffffc9000018f9b8 [ 50.573875] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88810d0fa800 [ 50.581006] R13: 0000000000000001 R14: ffff88843f42ab80 R15: ffff88810d0fa800 [ 50.588139] FS: 0000000000000000(0000) GS:ffff88843f400000(0000) knlGS:0000000000000000 [ 50.596228] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 50.601975] CR2: 00007fbda63ea7f8 CR3: 000000043da0a002 CR4: 00000000003706f0 [ 50.609105] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 50.616240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 50.623379] Call Trace: [ 50.625837] __flush_work+0xaf/0x220 [ 50.629418] ? __queue_work+0x142/0x400 [ 50.633261] xfs_file_buffered_write+0x1d0/0x2c0 [xfs] [ 50.638468] do_iter_readv_writev+0x157/0x1c0 [ 50.642833] do_iter_write+0x7d/0x1c0 [ 50.646513] lo_write_bvec+0x62/0x1a0 [loop] [ 50.650804] loop_process_work+0x20d/0xb80 [loop] [ 50.655543] ? newidle_balance+0x1f0/0x400 [ 50.659647] process_one_work+0x1ed/0x3c0 [ 50.663696] worker_thread+0x50/0x3c0 [ 50.667365] ? process_one_work+0x3c0/0x3c0 [ 50.671568] kthread+0x116/0x160 [ 50.674813] ? kthread_park+0xa0/0xa0 [ 50.678476] ret_from_fork+0x22/0x30 [ 50.682070] ---[ end trace cdbf6e08ae434f6c ]--- [ 56.951496] loop: Write error at byte offset 1032134656, length 4096. [ 56.957973] blk_update_request: I/O error, dev loop0, sector 2015816 op 0x1:(WRITE) flags 0x4000 phys_seg 21 prio class 0 ---------------------------------------- Since xfs_init_mount_workqueues() as of 5.17-rc8 is doing mp->m_sync_workqueue = alloc_workqueue("xfs-sync/%s", XFS_WQFLAGS(WQ_FREEZABLE), 0, mp->m_super->s_id); , this will still be the case. Since scsi_add_host_with_dma() in drivers/scsi/hosts.c is doing shost->work_q = alloc_workqueue("%s", WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND, 1, shost->work_q_name); iscsi_host_alloc() in drivers/scsi/libiscsi.c is doing ihost->workq = alloc_workqueue("%s", WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND, 1, ihost->workq_name); iscsi_transport_init() in drivers/scsi/scsi_transport_iscsi.c is doing iscsi_eh_timer_workq = alloc_workqueue("%s", WQ_SYSFS | __WQ_LEGACY | WQ_MEM_RECLAIM | WQ_UNBOUND, 1, "iscsi_eh"); include/linux/workqueue.h is doing #define create_workqueue(name) \ alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name)) , I guess that also adding __WQ_LEGACY flag would avoid hitting WARN_ONCE(worker && ((worker->current_pwq->wq->flags & (WQ_MEM_RECLAIM | __WQ_LEGACY)) == WQ_MEM_RECLAIM), "workqueue: WQ_MEM_RECLAIM %s:%ps is flushing !WQ_MEM_RECLAIM %s:%ps", worker->current_pwq->wq->name, worker->current_func, target_wq->name, target_func); warning. But since include/linux/workqueue.h only says __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */ , I can't tell when not to specify __WQ_LEGACY and WQ_MEM_RECLAIM together... Tejun, what is the intent of this warning? Can the description of __WQ_LEGACY flag be updated? I think that the loop module had better reserve one "struct task_struct" for each loop device. I guess that, in general, waiting for a work in !WQ_MEM_RECLAIM WQ from a WQ_MEM_RECLAIM WQ is dangerous because that work may not be able to find "struct task_struct" for processing that work. Then, what we should do is to create mp->m_sync_workqueue with WQ_MEM_RECLAIM flag added instead of creating lo->workqueue with __WQ_LEGACY + WQ_MEM_RECLAIM flags added... Is __WQ_LEGACY + WQ_MEM_RECLAIM combination a hack for silencing this warning without fixing various WQs used by xfs and other filesystems?