On Tue, Jan 28, 2020 at 11:44 AM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote: > > On Tue, Jan 28, 2020 at 3:39 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > On Tue 28-01-20 02:48:57, Matthew Wilcox wrote: > > > Doesn't the stack trace above indicate that we're doing migration as > > > the result of an allocation in add_to_page_cache_lru()? > > > > Which stack trace do you refer to? Because the one above doesn't show > > much more beyond mem_cgroup_iter and likewise others in this email > > thread. I do not really remember any stack with lock_page on the trace. > > I think the page is locked in add_to_page_cache_lru() by > __SetPageLocked(), as the stack trace shows __add_to_page_cache_locked(). > It is not yet unlocked, as it is still looping inside try_charge(). > > I will write a script to see if I can find the longest time spent in reclaim > as you suggested. After digging the changelog, I believe the following commit could fix the problem: commit f9c645621a28e37813a1de96d9cbd89cde94a1e4 Author: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Date: Mon Sep 23 15:37:08 2019 -0700 memcg, oom: don't require __GFP_FS when invoking memcg OOM killer which is not yet in our 4.19 branch yet. We will sync with 4.19 stable soon. Thanks.