On Tue, Mar 22, 2022 at 12:19:40PM -1000, Tejun Heo wrote: > On Wed, Mar 23, 2022 at 07:05:56AM +0900, Tetsuo Handa wrote: > > > Hmmm... yeah, I actually don't know the exact dependency here and the > > > dependency may not be real - e.g. the conclusion might be that loop is > > > conflating different uses and needs to split its use of workqueues into two > > > separate ones. Tetsuo, can you post more details on the warning that you're > > > seeing? > > > > > > > It was reported at https://lore.kernel.org/all/20210322060334.GD32426@xsang-OptiPlex-9020/ . > > Looks like a correct dependency to me. The work item is being flushed from > good old write path. Dave? The filesystem buffered write IO path isn't part of memory reclaim - it's a user IO path and I think most filesystems will treat it that way. We've had similar layering problems with the loop IO path implyingi GFP_NOFS must be used by filesystems allocating memory in the IO path - we solved that by requiring the loop IO submission context (loop_process_work()) to set PF_MEMALLOC_NOIO so that it didn't deadlock anywhere in the underlying filesystems that have no idea that the loop device has added memory reclaim constraints to the IO path. This seems like it's the same layering problem - syscall facing IO paths are designed for incoming IO from user context, not outgoing writeback IO from memory reclaim contexts. Memory reclaim contexts are supposed to use back end filesystem operations like ->writepages() to flush dirty data when necessary. If the loop device IO mechanism means that every ->write_iter path needs to be considered as directly in the memory reclaim path, then that means a huge amount of the kernel needs to be considered as "in memory reclaim". i.e. it's not just this one XFS workqueue that is going have this problem - it's any workqueue that can be waited on by the incoming IO path. For example, network filesystem might put the network stack directly in the IO path. Which means if we then put loop on top of that filesystems, various workqueues in the network stack may now need to be considered as running under the memory reclaim path because of the loop block device. I don't know what the solution is, but if the fix is "xfs needs to mark a workqueue that has nothing to do with memory reclaim as WQ_MEM_RECLAIM because of the loop device" then we're talking about playing workqueue whack-a-mole across the entire kernel forever more.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx