On Sun 26-01-20 15:39:35, Matthew Wilcox wrote: > On Sun, Jan 26, 2020 at 11:53:55AM -0800, Cong Wang wrote: > > On Tue, Jan 21, 2020 at 1:00 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > On Mon 20-01-20 14:48:05, Cong Wang wrote: > > > > It got stuck somewhere along the call path of mem_cgroup_try_charge(), > > > > and the trace events of mm_vmscan_lru_shrink_inactive() indicates this > > > > too: > > > > > > So it seems that you are condending on the page lock. It is really > > > unexpected that the reclaim would take that long though. Please try to > > > enable more vmscan tracepoints to see where the time is spent. > > > > I suspect the process gets stuck in the retry loop in try_charge(), as > > the _shortest_ stacktrace of the perf samples indicated: > > > > cycles:ppp: > > ffffffffa72963db mem_cgroup_iter > > ffffffffa72980ca mem_cgroup_oom_unlock > > ffffffffa7298c15 try_charge > > ffffffffa729a886 mem_cgroup_try_charge > > ffffffffa720ec03 __add_to_page_cache_locked > > ffffffffa720ee3a add_to_page_cache_lru > > ffffffffa7312ddb iomap_readpages_actor > > ffffffffa73133f7 iomap_apply > > ffffffffa73135da iomap_readpages > > ffffffffa722062e read_pages > > ffffffffa7220b3f __do_page_cache_readahead > > ffffffffa7210554 filemap_fault > > ffffffffc039e41f __xfs_filemap_fault > > ffffffffa724f5e7 __do_fault > > ffffffffa724c5f2 __handle_mm_fault > > ffffffffa724cbc6 handle_mm_fault > > ffffffffa70a313e __do_page_fault > > ffffffffa7a00dfe page_fault > > > > But I don't see how it could be, the only possible case is when > > mem_cgroup_oom() returns OOM_SUCCESS. However I can't > > find any clue in dmesg pointing to OOM. These processes in the > > same memcg are either running or sleeping (that is not exiting or > > coredump'ing), I don't see how and why they could be selected as > > a victim of OOM killer. I don't see any signal pending either from > > their /proc/X/status. > > I think this is a situation where we might end up with a genuine deadlock > if we're not trylocking the pages. readahead allocates a batch of > locked pages and adds them to the pagecache. If it has allocated, > say, 5 pages, successfully inserted the first three into i_pages, then > needs to allocate memory to insert the fourth one into i_pages, and > the process then attempts to migrate the pages which are still locked, > they will never come unlocked because they haven't yet been submitted > to the filesystem for reading. Just to make sure I understand. Do you mean this? lock_page(A) alloc_pages try_to_compact_pages compact_zone_order compact_zone(MIGRATE_SYNC_LIGHT) migrate_pages unmap_and_move __unmap_and_move lock_page(A) -- Michal Hocko SUSE Labs