kernfs notify is used in write path of md (md_write_start) to wake up userspace daemon, like "mdmon" for updating md superblock of imsm raid, md write will wait for that update done before issuing the write, if this write is used for memory reclaim, the system may hung due to kernel notify can't be executed, that's because kernel notify is executed by "system_wq" which doesn't have a rescuer thread and kworker thread may not be created due to memory pressure, then userspace daemon can't be woke up and md write will hung. According Tejun, this can't be fixed by add RECLAIM to "system_wq" because that workqueue is shared and someone else might occupy that rescuer thread, to fix this from md side, have to replace kernfs notify with other way to communite with userspace daemon, that will break userspace interface, so use a separated workqueue for kernefs notify to allow it be used in memory reclaim context. Link: https://lore.kernel.org/all/a131af22-0a5b-4be1-b77e-8716c63e8883@xxxxxxxxxx/T/ Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx> --- fs/kernfs/file.c | 2 +- fs/kernfs/kernfs-internal.h | 1 + fs/kernfs/mount.c | 3 +++ 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index f0cb729e9a97..726bfd40a912 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -974,7 +974,7 @@ void kernfs_notify(struct kernfs_node *kn) kernfs_get(kn); kn->attr.notify_next = kernfs_notify_list; kernfs_notify_list = kn; - schedule_work(&kernfs_notify_work); + queue_work(kernfs_wq, &kernfs_notify_work); } spin_unlock_irqrestore(&kernfs_notify_lock, flags); } diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index 237f2764b941..beae5d328342 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -123,6 +123,7 @@ static inline bool kernfs_dir_changed(struct kernfs_node *parent, extern const struct super_operations kernfs_sops; extern struct kmem_cache *kernfs_node_cache, *kernfs_iattrs_cache; +extern struct workqueue_struct *kernfs_wq; /* * inode.c diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index 4628edde2e7e..7346ec49a621 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -24,6 +24,7 @@ struct kmem_cache *kernfs_node_cache __ro_after_init; struct kmem_cache *kernfs_iattrs_cache __ro_after_init; struct kernfs_global_locks *kernfs_locks __ro_after_init; +struct workqueue_struct *kernfs_wq __ro_after_init; static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry) { @@ -432,4 +433,6 @@ void __init kernfs_init(void) 0, SLAB_PANIC, NULL); kernfs_lock_init(); + + kernfs_wq = alloc_workqueue("kernfs", WQ_MEM_RECLAIM, 0); } -- 2.39.3 (Apple Git-145)