On Tue, Jan 21, 2020 at 1:00 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Mon 20-01-20 14:48:05, Cong Wang wrote: > > It got stuck somewhere along the call path of mem_cgroup_try_charge(), > > and the trace events of mm_vmscan_lru_shrink_inactive() indicates this > > too: > > So it seems that you are condending on the page lock. It is really > unexpected that the reclaim would take that long though. Please try to > enable more vmscan tracepoints to see where the time is spent. Sorry for the delay. I have been trying to collect more data in one shot. This is a a typical round of the loop after enabling all vmscan tracepoints: <...>-455450 [007] .... 4048595.842992: mm_vmscan_memcg_reclaim_begin: order=0 may_writepage=1 gfp_flags=GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE classzone_idx= 4 <...>-455450 [007] d... 4048595.842995: mm_vmscan_lru_isolate: isolate_mode=0 classzone=3 order=0 nr_requested=1 nr_scanned=1 nr_skipped=0 nr_taken=1 lru=inactive_file <...>-455450 [007] .... 4048595.842995: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=4 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455450 [007] dN.. 4048595.842997: mm_vmscan_lru_isolate: isolate_mode=0 classzone=3 order=0 nr_requested=1 nr_scanned=1 nr_skipped=0 nr_taken=1 lru=inactive_file <...>-455450 [007] .... 4048595.843001: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=3 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455450 [007] d... 4048595.843002: mm_vmscan_lru_isolate: isolate_mode=0 classzone=3 order=0 nr_requested=5 nr_scanned=5 nr_skipped=0 nr_taken=5 lru=inactive_file <...>-455450 [007] .... 4048595.843004: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=5 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=2 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455450 [007] d... 4048595.843006: mm_vmscan_lru_isolate: isolate_mode=0 classzone=3 order=0 nr_requested=9 nr_scanned=9 nr_skipped=0 nr_taken=9 lru=inactive_file <...>-455450 [007] .... 4048595.843007: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=9 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455450 [007] d... 4048595.843009: mm_vmscan_lru_isolate: isolate_mode=0 classzone=3 order=0 nr_requested=17 nr_scanned=15 nr_skipped=0 nr_taken=15 lru=inactive_file <...>-455450 [007] .... 4048595.843011: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=15 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activa te=0 nr_ref_keep=0 nr_unmap_fail=0 priority=0 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455450 [007] .... 4048595.843012: mm_vmscan_memcg_reclaim_end: nr_reclaimed=0 The whole trace output is huge (33M), I can provide it on request. I suspect the process gets stuck in the retry loop in try_charge(), as the _shortest_ stacktrace of the perf samples indicated: cycles:ppp: ffffffffa72963db mem_cgroup_iter ffffffffa72980ca mem_cgroup_oom_unlock ffffffffa7298c15 try_charge ffffffffa729a886 mem_cgroup_try_charge ffffffffa720ec03 __add_to_page_cache_locked ffffffffa720ee3a add_to_page_cache_lru ffffffffa7312ddb iomap_readpages_actor ffffffffa73133f7 iomap_apply ffffffffa73135da iomap_readpages ffffffffa722062e read_pages ffffffffa7220b3f __do_page_cache_readahead ffffffffa7210554 filemap_fault ffffffffc039e41f __xfs_filemap_fault ffffffffa724f5e7 __do_fault ffffffffa724c5f2 __handle_mm_fault ffffffffa724cbc6 handle_mm_fault ffffffffa70a313e __do_page_fault ffffffffa7a00dfe page_fault But I don't see how it could be, the only possible case is when mem_cgroup_oom() returns OOM_SUCCESS. However I can't find any clue in dmesg pointing to OOM. These processes in the same memcg are either running or sleeping (that is not exiting or coredump'ing), I don't see how and why they could be selected as a victim of OOM killer. I don't see any signal pending either from their /proc/X/status. Here is the status of the whole memcg: $ sudo cat /sys/fs/cgroup/memory/system.slice/osqueryd.service/memory.oom_control oom_kill_disable 0 under_oom 0 oom_kill 0 $ sudo cat /sys/fs/cgroup/memory/system.slice/osqueryd.service/memory.stat cache 376832 rss 255864832 rss_huge 0 shmem 0 mapped_file 135168 dirty 135168 writeback 0 pgpgin 5327157 pgpgout 5264598 pgfault 4251687 pgmajfault 7491 inactive_anon 4096 active_anon 256184320 inactive_file 32768 active_file 0 unevictable 0 hierarchical_memory_limit 262144000 total_cache 376832 total_rss 255864832 total_rss_huge 0 total_shmem 0 total_mapped_file 135168 total_dirty 135168 total_writeback 0 total_pgpgin 5327157 total_pgpgout 5264598 total_pgfault 4251687 total_pgmajfault 7491 total_inactive_anon 4096 total_active_anon 256184320 total_inactive_file 86016 total_active_file 0 total_unevictable 0 Please let me know if I can provide anything else. Thanks.