On 01/24/2017 12:51 AM, Andrew Morton wrote:
(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).
A 2100 second page allocation stall!
I think we finally figured out the problem using Tetsuo Handa's mallocwd
patch. It seems like it is in XFS direct reclaim path.
Here's how it goes:
memory is low, rsync goes into direct reclaim, locking xfs mutex in
xfs_reclaim_inodes_nr():
2017-02-14T00:12:59.811447+03:00 storage9 [24646.497290] MemAlloc:
rsync(19706) flags=0x404840 switches=8692 seq=340
gfp=0x27080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK) order=0
delay=6795
2017-02-14T00:12:59.811447+03:00 storage9 [24646.497550] rsync R
2017-02-14T00:12:59.811579+03:00 storage9 0 19706 5000 0x00000000
2017-02-14T00:12:59.811579+03:00 storage9 [24646.497690] ffffa4361dc36c00
2017-02-14T00:12:59.811591+03:00 storage9 0000000c1dc36c00
2017-02-14T00:12:59.811591+03:00 storage9 ffffa4361dc36c00
2017-02-14T00:12:59.811724+03:00 storage9 ffffa44347792580
2017-02-14T00:12:59.811841+03:00 storage9
2017-02-14T00:12:59.811841+03:00 storage9 [24646.497951] 0000000000000000
2017-02-14T00:12:59.811846+03:00 storage9 ffffb1cab6343458
2017-02-14T00:12:59.811885+03:00 storage9 ffffffffb383e799
2017-02-14T00:12:59.811987+03:00 storage9 0000000000000000
2017-02-14T00:12:59.812103+03:00 storage9
2017-02-14T00:12:59.812103+03:00 storage9 [24646.498208] ffffa443ffff7a00
2017-02-14T00:12:59.812103+03:00 storage9 0000000000000001
2017-02-14T00:12:59.812104+03:00 storage9 ffffb1cab6343448
2017-02-14T00:12:59.812233+03:00 storage9 0000000000000002
2017-02-14T00:12:59.812350+03:00 storage9
2017-02-14T00:12:59.812475+03:00 storage9 [24646.498462] Call Trace:
2017-02-14T00:12:59.812610+03:00 storage9 [24646.498587]
[<ffffffffb383e799>] ? __schedule+0x179/0x5c8
2017-02-14T00:12:59.812733+03:00 storage9 [24646.498718]
[<ffffffffb383ecc2>] ? schedule+0x32/0x80
2017-02-14T00:12:59.812869+03:00 storage9 [24646.498846]
[<ffffffffb3841229>] ? schedule_timeout+0x159/0x2a0
2017-02-14T00:12:59.812997+03:00 storage9 [24646.498977]
[<ffffffffb30f1450>] ? add_timer_on+0x130/0x130
2017-02-14T00:12:59.813130+03:00 storage9 [24646.499108]
[<ffffffffb318ff13>] ? __alloc_pages_nodemask+0xe73/0x16b0
2017-02-14T00:12:59.813263+03:00 storage9 [24646.499240]
[<ffffffffb31deb2a>] ? alloc_pages_current+0x9a/0x120
2017-02-14T00:12:59.813388+03:00 storage9 [24646.499371]
[<ffffffffb33d1a51>] ? xfs_buf_allocate_memory+0x171/0x2c0
2017-02-14T00:12:59.813564+03:00 storage9 [24646.499503]
[<ffffffffb337c48b>] ? xfs_buf_get_map+0x18b/0x1d0
2017-02-14T00:12:59.813654+03:00 storage9 [24646.499634]
[<ffffffffb337cdcb>] ? xfs_buf_read_map+0x3b/0x160
2017-02-14T00:12:59.813783+03:00 storage9 [24646.499765]
[<ffffffffb33b71b0>] ? xfs_trans_read_buf_map+0x1f0/0x490
2017-02-14T00:12:59.813931+03:00 storage9 [24646.499897]
[<ffffffffb3362859>] ? xfs_imap_to_bp+0x79/0x120
2017-02-14T00:12:59.814056+03:00 storage9 [24646.500029]
[<ffffffffb3394a38>] ? xfs_iflush+0x118/0x380
2017-02-14T00:12:59.814196+03:00 storage9 [24646.500158]
[<ffffffffb30d6130>] ? wake_atomic_t_function+0x40/0x40
2017-02-14T00:12:59.814314+03:00 storage9 [24646.500289]
[<ffffffffb3385af4>] ? xfs_reclaim_inode+0x274/0x3f0
2017-02-14T00:12:59.814444+03:00 storage9 [24646.500421]
[<ffffffffb3385e29>] ? xfs_reclaim_inodes_ag+0x1b9/0x2c0
2017-02-14T00:12:59.814573+03:00 storage9 [24646.500553]
[<ffffffffb345c6b8>] ? radix_tree_next_chunk+0x108/0x2a0
2017-02-14T00:12:59.814701+03:00 storage9 [24646.500685]
[<ffffffffb30f0e31>] ? lock_timer_base+0x51/0x70
2017-02-14T00:12:59.814846+03:00 storage9 [24646.500814]
[<ffffffffb345ca4e>] ? radix_tree_gang_lookup_tag+0xae/0x180
2017-02-14T00:12:59.814966+03:00 storage9 [24646.500946]
[<ffffffffb336f588>] ? xfs_perag_get_tag+0x48/0x100
2017-02-14T00:12:59.815104+03:00 storage9 [24646.501079]
[<ffffffffb31b63c1>] ? __list_lru_walk_one.isra.7+0x31/0x120
2017-02-14T00:12:59.815231+03:00 storage9 [24646.501212]
[<ffffffffb322aed0>] ? iget5_locked+0x240/0x240
2017-02-14T00:12:59.815360+03:00 storage9 [24646.501342]
[<ffffffffb33873c1>] ? xfs_reclaim_inodes_nr+0x31/0x40
2017-02-14T00:12:59.815489+03:00 storage9 [24646.501472]
[<ffffffffb3213160>] ? super_cache_scan+0x1a0/0x1b0
2017-02-14T00:12:59.815629+03:00 storage9 [24646.501603]
[<ffffffffb319bf62>] ? shrink_slab+0x262/0x440
2017-02-14T00:12:59.815760+03:00 storage9 [24646.501734]
[<ffffffffb319cc8b>] ? drop_slab_node+0x2b/0x60
2017-02-14T00:12:59.815891+03:00 storage9 [24646.501864]
[<ffffffffb319cd02>] ? drop_slab+0x42/0x70
2017-02-14T00:12:59.816020+03:00 storage9 [24646.501994]
[<ffffffffb318abc0>] ? out_of_memory+0x220/0x560
2017-02-14T00:12:59.816160+03:00 storage9 [24646.502122]
[<ffffffffb31903b2>] ? __alloc_pages_nodemask+0x1312/0x16b0
2017-02-14T00:12:59.816285+03:00 storage9 [24646.502255]
[<ffffffffb31deb2a>] ? alloc_pages_current+0x9a/0x120
2017-02-14T00:12:59.816407+03:00 storage9 [24646.502386]
[<ffffffffb304b803>] ? pte_alloc_one+0x13/0x40
2017-02-14T00:12:59.816536+03:00 storage9 [24646.502517]
[<ffffffffb31be21f>] ? handle_mm_fault+0xc7f/0x14b0
2017-02-14T00:12:59.816659+03:00 storage9 [24646.502648]
[<ffffffffb30461ff>] ? __do_page_fault+0x1cf/0x5a0
2017-02-14T00:12:59.816851+03:00 storage9 [24646.502777]
[<ffffffffb3843362>] ? page_fault+0x22/0x30
But it cannot get memory, because it's low (?). So it stays blocked.
Other processes do the same but they can't get past the mutex in
xfs_reclaim_inodes_nr():
2017-02-14T00:12:59.817057+03:00 storage9 [24646.502909] MemAlloc:
rsync(19707) flags=0x404840 switches=6344 seq=638
gfp=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD) order=0 delay=3257
uninterruptible
2017-02-14T00:12:59.817062+03:00 storage9 [24646.503167] rsync D
2017-02-14T00:12:59.817189+03:00 storage9 0 19707 5000 0x00000000
2017-02-14T00:12:59.817197+03:00 storage9 [24646.503299] 0000000000000000
2017-02-14T00:12:59.817197+03:00 storage9 ffffa44347793840
2017-02-14T00:12:59.817207+03:00 storage9 ffffa4277a36ec00
2017-02-14T00:12:59.817316+03:00 storage9 ffffa4361dc35100
2017-02-14T00:12:59.817446+03:00 storage9
2017-02-14T00:12:59.817446+03:00 storage9 [24646.503554] ffffa4437a514300
2017-02-14T00:12:59.817461+03:00 storage9 ffffb1cab0823638
2017-02-14T00:12:59.817461+03:00 storage9 ffffffffb383e799
2017-02-14T00:12:59.817598+03:00 storage9 ffffa44365967b00
2017-02-14T00:12:59.817699+03:00 storage9
2017-02-14T00:12:59.817699+03:00 storage9 [24646.503809] ffffb1cab0823680
2017-02-14T00:12:59.817717+03:00 storage9 0000000000000001
2017-02-14T00:12:59.817717+03:00 storage9 ffffa443424062a8
2017-02-14T00:12:59.817832+03:00 storage9 0000000000000000
2017-02-14T00:12:59.817948+03:00 storage9
2017-02-14T00:12:59.818078+03:00 storage9 [24646.504061] Call Trace:
2017-02-14T00:12:59.818211+03:00 storage9 [24646.504189]
[<ffffffffb383e799>] ? __schedule+0x179/0x5c8
2017-02-14T00:12:59.818339+03:00 storage9 [24646.504321]
[<ffffffffb383ecc2>] ? schedule+0x32/0x80
2017-02-14T00:12:59.818471+03:00 storage9 [24646.504451]
[<ffffffffb383ee6e>] ? schedule_preempt_disabled+0xe/0x20
2017-02-14T00:12:59.818611+03:00 storage9 [24646.504582]
[<ffffffffb38400aa>] ? __mutex_lock_slowpath+0x8a/0x100
2017-02-14T00:12:59.818737+03:00 storage9 [24646.504715]
[<ffffffffb3840133>] ? mutex_lock+0x13/0x22
2017-02-14T00:12:59.818870+03:00 storage9 [24646.504846]
[<ffffffffb3385e98>] ? xfs_reclaim_inodes_ag+0x228/0x2c0
2017-02-14T00:12:59.819024+03:00 storage9 [24646.504977]
[<ffffffffb345c6b8>] ? radix_tree_next_chunk+0x108/0x2a0
2017-02-14T00:12:59.819124+03:00 storage9 [24646.505107]
[<ffffffffb345ca4e>] ? radix_tree_gang_lookup_tag+0xae/0x180
2017-02-14T00:12:59.819273+03:00 storage9 [24646.505238]
[<ffffffffb336f588>] ? xfs_perag_get_tag+0x48/0x100
2017-02-14T00:12:59.819397+03:00 storage9 [24646.505370]
[<ffffffffb31b63c1>] ? __list_lru_walk_one.isra.7+0x31/0x120
2017-02-14T00:12:59.819528+03:00 storage9 [24646.505502]
[<ffffffffb322aed0>] ? iget5_locked+0x240/0x240
2017-02-14T00:12:59.819657+03:00 storage9 [24646.505634]
[<ffffffffb33873c1>] ? xfs_reclaim_inodes_nr+0x31/0x40
2017-02-14T00:12:59.819781+03:00 storage9 [24646.505766]
[<ffffffffb3213160>] ? super_cache_scan+0x1a0/0x1b0
2017-02-14T00:12:59.819915+03:00 storage9 [24646.505897]
[<ffffffffb319bf62>] ? shrink_slab+0x262/0x440
2017-02-14T00:12:59.820100+03:00 storage9 [24646.506028]
[<ffffffffb319fa6f>] ? shrink_node+0xef/0x2d0
2017-02-14T00:12:59.820176+03:00 storage9 [24646.506158]
[<ffffffffb319ff42>] ? do_try_to_free_pages+0xc2/0x2b0
2017-02-14T00:12:59.820300+03:00 storage9 [24646.506288]
[<ffffffffb31a03e2>] ? try_to_free_pages+0xe2/0x1c0
2017-02-14T00:12:59.820442+03:00 storage9 [24646.506415]
[<ffffffffb318c1a9>] ? __perform_reclaim.isra.80+0x79/0xc0
2017-02-14T00:12:59.820556+03:00 storage9 [24646.506542]
[<ffffffffb318f901>] ? __alloc_pages_nodemask+0x861/0x16b0
2017-02-14T00:12:59.820683+03:00 storage9 [24646.506669]
[<ffffffffb345d64b>] ? __radix_tree_lookup+0x7b/0xe0
2017-02-14T00:12:59.820851+03:00 storage9 [24646.506796]
[<ffffffffb31deb2a>] ? alloc_pages_current+0x9a/0x120
2017-02-14T00:12:59.820942+03:00 storage9 [24646.506922]
[<ffffffffb3188b56>] ? filemap_fault+0x396/0x540
2017-02-14T00:12:59.821072+03:00 storage9 [24646.507048]
[<ffffffffb345c6b8>] ? radix_tree_next_chunk+0x108/0x2a0
2017-02-14T00:12:59.821192+03:00 storage9 [24646.507176]
[<ffffffffb32a5bdf>] ? ext4_filemap_fault+0x3f/0x60
2017-02-14T00:12:59.821318+03:00 storage9 [24646.507303]
[<ffffffffb31b8591>] ? __do_fault+0x71/0x120
2017-02-14T00:12:59.821441+03:00 storage9 [24646.507428]
[<ffffffffb31be36d>] ? handle_mm_fault+0xdcd/0x14b0
2017-02-14T00:12:59.821574+03:00 storage9 [24646.507555]
[<ffffffffb31ea9a4>] ? kmem_cache_free+0x204/0x220
2017-02-14T00:12:59.821697+03:00 storage9 [24646.507682]
[<ffffffffb32114b9>] ? __fput+0x149/0x200
2017-02-14T00:12:59.821824+03:00 storage9 [24646.507806]
[<ffffffffb30461ff>] ? __do_page_fault+0x1cf/0x5a0
2017-02-14T00:12:59.821954+03:00 storage9 [24646.507934]
[<ffffffffb3843362>] ? page_fault+0x22/0x30
Even kswapd gets stuck:
2017-02-14T00:13:10.306351+03:00 storage9 [24656.991375] MemAlloc:
kswapd0(114) flags=0xa40840 switches=22109 uninterruptible
2017-02-14T00:13:10.306351+03:00 storage9 [24656.991622] kswapd0 D
2017-02-14T00:13:10.306473+03:00 storage9 0 114 2 0x00000000
2017-02-14T00:13:10.306482+03:00 storage9 [24656.991753] 0000000000000000
2017-02-14T00:13:10.306482+03:00 storage9 ffffa43830be0f00
2017-02-14T00:13:10.306488+03:00 storage9 ffffa44342ff2880
2017-02-14T00:13:10.306601+03:00 storage9 ffffa443600e0d80
2017-02-14T00:13:10.306714+03:00 storage9
2017-02-14T00:13:10.306723+03:00 storage9 [24656.992000] ffffa4437ad14300
2017-02-14T00:13:10.306723+03:00 storage9 ffffb1ca8d2737d8
2017-02-14T00:13:10.306731+03:00 storage9 ffffffffb383e799
2017-02-14T00:13:10.306846+03:00 storage9 ffffb1ca8d273788
2017-02-14T00:13:10.306956+03:00 storage9
2017-02-14T00:13:10.306957+03:00 storage9 [24656.992245] ffffb1ca8d273788
2017-02-14T00:13:10.306957+03:00 storage9 ffffb1ca8d273798
2017-02-14T00:13:10.307011+03:00 storage9 ffffb1ca8d273798
2017-02-14T00:13:10.307087+03:00 storage9 0000000002400001
2017-02-14T00:13:10.307200+03:00 storage9
2017-02-14T00:13:10.307321+03:00 storage9 [24656.992489] Call Trace:
2017-02-14T00:13:10.307475+03:00 storage9 [24656.992611]
[<ffffffffb383e799>] ? __schedule+0x179/0x5c8
2017-02-14T00:13:10.307572+03:00 storage9 [24656.992736]
[<ffffffffb383ecc2>] ? schedule+0x32/0x80
2017-02-14T00:13:10.307703+03:00 storage9 [24656.992861]
[<ffffffffb3841002>] ? rwsem_down_read_failed+0xb2/0x100
2017-02-14T00:13:10.307831+03:00 storage9 [24656.992987]
[<ffffffffb3466004>] ? call_rwsem_down_read_failed+0x14/0x30
2017-02-14T00:13:10.307949+03:00 storage9 [24656.993114]
[<ffffffffb3840443>] ? down_read+0x13/0x30
2017-02-14T00:13:10.308075+03:00 storage9 [24656.993239]
[<ffffffffb3373420>] ? xfs_map_blocks+0x90/0x2f0
2017-02-14T00:13:10.308214+03:00 storage9 [24656.993366]
[<ffffffffb3375256>] ? xfs_do_writepage+0x2b6/0x6a0
2017-02-14T00:13:10.308337+03:00 storage9 [24656.993496]
[<ffffffffb3245589>] ? submit_bh_wbc+0x169/0x200
2017-02-14T00:13:10.308518+03:00 storage9 [24656.993627]
[<ffffffffb337566c>] ? xfs_vm_writepage+0x2c/0x50
2017-02-14T00:13:10.308603+03:00 storage9 [24656.993760]
[<ffffffffb319bb0b>] ? pageout.isra.60+0xeb/0x2e0
2017-02-14T00:13:10.308728+03:00 storage9 [24656.993891]
[<ffffffffb319df06>] ? shrink_page_list+0x736/0xa50
2017-02-14T00:13:10.308935+03:00 storage9 [24656.994022]
[<ffffffffb319eb52>] ? shrink_inactive_list+0x202/0x4b0
2017-02-14T00:13:10.308993+03:00 storage9 [24656.994152]
[<ffffffffb319f4d1>] ? shrink_node_memcg+0x2e1/0x790
2017-02-14T00:13:10.309148+03:00 storage9 [24656.994282]
[<ffffffffb319fa49>] ? shrink_node+0xc9/0x2d0
2017-02-14T00:13:10.309258+03:00 storage9 [24656.994416]
[<ffffffffb31a0944>] ? kswapd+0x2e4/0x690
2017-02-14T00:13:10.309854+03:00 storage9 [24656.994546]
[<ffffffffb31a0660>] ? mem_cgroup_shrink_node+0x1a0/0x1a0
2017-02-14T00:13:10.309854+03:00 storage9 [24656.994678]
[<ffffffffb31a0660>] ? mem_cgroup_shrink_node+0x1a0/0x1a0
2017-02-14T00:13:10.309854+03:00 storage9 [24656.994808]
[<ffffffffb30b4fc2>] ? kthread+0xc2/0xe0
2017-02-14T00:13:10.309854+03:00 storage9 [24656.994937]
[<ffffffffb30b4f00>] ? __kthread_init_worker+0xb0/0xb0
2017-02-14T00:13:10.309910+03:00 storage9 [24656.995069]
[<ffffffffb38425e2>] ? ret_from_fork+0x22/0x30
Which finally leads to "Kernel panic - not syncing: Out of memory and no
killable processes..." as no process is able to proceed.
I quickly hacked this:
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 9ef152b..8adfb0a 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1254,7 +1254,7 @@ struct xfs_inode *
xfs_reclaim_work_queue(mp);
xfs_ail_push_all(mp->m_ail);
- return xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT,
&nr_to_scan);
+ return 0; // xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT,
&nr_to_scan);
}
We ran 2 of our machines with this patch for a night, no more
lockups/stalls were detected.
xfsaild does its work asynchronously, so xfs_inodes don't run wild as
confirmed by slabtop.
I put netconsole logs here: http://aplkv.beget.tech/lkml/xfs/ for anyone
interested.
--
Alexander Polakov | system software engineer | https://beget.com
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html