Hi, Michal On Thu, Jan 9, 2020 at 11:38 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > [CC Mel and Vlastimil] > > On Thu 09-01-20 14:56:46, Cong Wang wrote: > > We observed kcompactd hung at __lock_page(): > > > > INFO: task kcompactd0:57 blocked for more than 120 seconds. > > Not tainted 4.19.56.x86_64 #1 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > kcompactd0 D 0 57 2 0x80000000 > > Call Trace: > > ? __schedule+0x236/0x860 > > schedule+0x28/0x80 > > io_schedule+0x12/0x40 > > __lock_page+0xf9/0x120 > > ? page_cache_tree_insert+0xb0/0xb0 > > ? update_pageblock_skip+0xb0/0xb0 > > migrate_pages+0x88c/0xb90 > > ? isolate_freepages_block+0x3b0/0x3b0 > > compact_zone+0x5f1/0x870 > > kcompactd_do_work+0x130/0x2c0 > > ? __switch_to_asm+0x35/0x70 > > ? __switch_to_asm+0x41/0x70 > > ? kcompactd_do_work+0x2c0/0x2c0 > > ? kcompactd+0x73/0x180 > > kcompactd+0x73/0x180 > > ? finish_wait+0x80/0x80 > > kthread+0x113/0x130 > > ? kthread_create_worker_on_cpu+0x50/0x50 > > ret_from_fork+0x35/0x40 > > > > which faddr2line maps to: > > > > migrate_pages+0x88c/0xb90: > > lock_page at include/linux/pagemap.h:483 > > (inlined by) __unmap_and_move at mm/migrate.c:1024 > > (inlined by) unmap_and_move at mm/migrate.c:1189 > > (inlined by) migrate_pages at mm/migrate.c:1419 > > > > Sometimes kcompactd eventually got out of this situation, sometimes not. > > What does this mean exactly? Who is holding the page lock? As I explained in other email, I didn't locate the process holding the page lock before I sent out this patch, as I was fooled by /proc/X/stack. But now I got its stack trace with `perf`: ffffffffa722aa06 shrink_inactive_list ffffffffa722b3d7 shrink_node_memcg ffffffffa722b85f shrink_node ffffffffa722bc89 do_try_to_free_pages ffffffffa722c179 try_to_free_mem_cgroup_pages ffffffffa7298703 try_charge ffffffffa729a886 mem_cgroup_try_charge ffffffffa720ec03 __add_to_page_cache_locked ffffffffa720ee3a add_to_page_cache_lru ffffffffa7312ddb iomap_readpages_actor ffffffffa73133f7 iomap_apply ffffffffa73135da iomap_readpages ffffffffa722062e read_pages ffffffffa7220b3f __do_page_cache_readahead ffffffffa7210554 filemap_fault ffffffffc039e41f __xfs_filemap_fault ffffffffa724f5e7 __do_fault ffffffffa724c5f2 __handle_mm_fault ffffffffa724cbc6 handle_mm_fault ffffffffa70a313e __do_page_fault ffffffffa7a00dfe page_fault It got stuck somewhere along the call path of mem_cgroup_try_charge(), and the trace events of mm_vmscan_lru_shrink_inactive() indicates this too: <...>-455459 [003] .... 2691911.664706: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=0 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664711: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=4 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664714: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=2 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=3 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664717: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=5 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=2 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664720: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=5 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664725: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=7 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=0 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664730: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=2 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664732: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=0 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664736: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=4 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664739: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=2 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=3 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664744: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=5 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=2 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664747: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=4 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664752: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=12 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activa te=0 nr_ref_keep=0 nr_unmap_fail=0 priority=0 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664755: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=4 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664761: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=2 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664762: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664764: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=0 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664770: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=4 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=1 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664777: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=21 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activa te=0 nr_ref_keep=0 nr_unmap_fail=0 priority=0 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664780: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=4 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC <...>-455459 [003] .... 2691911.664783: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=2 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activat e=0 nr_ref_keep=0 nr_unmap_fail=0 priority=3 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC Thanks.