On Wed, Sep 14, 2022 at 03:51:42PM -0700, Andrew Morton wrote: > On Wed, 14 Sep 2022 10:33:18 +0800 Hongchen Zhang <zhanghongchen@xxxxxxxxxxx> wrote: > > > when a process falls into page fault and there is not enough free > > memory,it will do direct reclaim. At the same time,it is holding > > mmap_lock.So in case of multi-thread,it should exit from page fault > > ASAP. > > When reclaim memory,we do scan adjust between anon and file lru which > > may cost too much time and trigger hung task for other thread.So for a > > process which is not kswapd,it should just do a little scan adjust. > > Well, that's a pretty nasty bug. Before diving into a possible fix, > can you please tell us more about how this happens? What sort of > machine, what sort of workload. Can you suggest why others are not > experiencing this? One thing I'd like to know is whether the page fault is for an anonymous or file-backed page. We already drop the mmap_lock for doing file I/O (or we should ...) and maybe we also need to drop the mmap_lock for doing direct reclaim?