Re: BUG() due to "mm: put_and_wait_on_page_locked() while page is migrated"

Qian Cai <cai@xxxxxx> · Thu, 24 Jan 2019 23:31:46 -0500

On 1/24/19 11:19 PM, Hugh Dickins wrote:
> On Wed, 23 Jan 2019, Michal Hocko wrote:
>> On Tue 22-01-19 23:29:04, Qian Cai wrote:
>>> Running LTP migrate_pages03 [1] a few times triggering BUG() below on an arm64
>>> ThunderX2 server. Reverted the commit 9a1ea439b16b9 ("mm:
>>> put_and_wait_on_page_locked() while page is migrated") allows it to run
>>> continuously.
>>>
>>> put_and_wait_on_page_locked
>>>   wait_on_page_bit_common
>>>     put_page
>>>       put_page_testzero
>>>         VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
>>>
>>> [1]
>>> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/migrate_pages/migrate_pages03.c
>>>
>>> [ 1304.643587] page:ffff7fe0226ff000 count:2 mapcount:0 mapping:ffff8095c3406d58 index:0x7
>>> [ 1304.652082] xfs_address_space_operations [xfs]
>> [...]
>>> [ 1304.682652] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
>>
>> This looks like a page reference countimbalance to me. The page seemed
>> to be freed at the the migration code (wait_on_page_bit_common) called
>> put_page and immediatelly got reused for xfs allocation and that is why
>> we see its ref count==2. But I fail to see how that is possible as
>> __migration_entry_wait already does get_page_unless_zero so the
>> imbalance must have been preexisting.
> 
> This report worried me, but I've thought around it, and agree with
> Michal that it must be reflecting a preexisting refcount imbalance -
> preexisting in the sense that the imbalance occurred sometime before
> reaching put_and_wait_on_page_locked(), and in the sense that the bug
> causing the imbalance came in before the put_and_wait_on_page_locked()
> commit, perhaps even long ago.
> 
> If it is a software bug at all - I wonder if any other hardware shows
> the same issue - I have not seen it on x86 (though I wasn't using xfs),
> nor heard of anyone else reporting it - but thank you for doing so,
> it could be important.
> 
> But I (probably) disagree with Michal about the page being freed and
> reused for xfs allocation. I have no proof, but I think the likelihood
> is that the page shown is the old xfs page (from libc-2.28.so, I see)
> which is currently being migrated.
> 
> I realize that "last migrate reason: syscall_or_cpuset" would not get 
> set until later, but I think it's left over from the previous migration:
> migrate_pages03 looks like it's migrating pages back and forth repeatedly.
> 
> What I think happened is that something at some time earlier did a
> mistaken put_page() on the page.  Then __migration_entry_wait() raced
> with migrate_page_move_mapping(), in such a way that get_page_unless_zero()
> then briefly raised the page's refcount to expected_count, so migration was
> able to freeze the page (set its refcount transiently to 0).  Then put_and
> _wait_on_page_locked() reached the put_page() in wait_on_page_bit_common()
> while migration still had the refcount frozen at 0, and bang, your crash.
> 
> But how come reverting the put_and_wait commit appears to fix it for you?
> That puzzled me, for a while I expected you then to see an equally visible
> crash in the old put_page() after wait_on_page_locked(), or else at the
> migration end where it puts the page afterwards (putback_lru_page perhaps).
> 
> I guess the answer comes from that "libc-2.28.so".  This page is one of
> those very popular pages which were next-to-impossible to migrate before
> the put_and_wait commit, because they are so widely mapped, and their
> migration entries so frequently faulted, that migration could not freeze
> them.  (With enough migration waiters to outweigh the off-by-one of the
> incorrect refcount.)
> 
> Being so widely used, the refcount imbalance on that page would (I think)
> only show up when unmounting the root at shutdown: easily missed.
> 
> So I think you've identified that the put_and_wait commit has exposed
> an existing bug, and it may be very tedious to track down where that is.
> Maybe the bug is itself triggered by migrate_pages03, but quite likely not.

It looks like the put_and_wait commit just make the bug easier to reproduce, as
it has finally been able to reproduce it (via a different path) after 50+ runs
of migrate_pages03 on one of the affected machines even with the commit reverted.

[17890.870176] page:ffff7fe02563c780 count:0 mapcount:0 mapping:ffff800803ce6d58
index:0x1
[17890.879190] xfs_address_space_operations [xfs]
[17890.879196] name:"ld-2.28.so"
[17890.883724] flags: 0x17ffffc00000807(locked|referenced|uptodate|arch_1)
[17890.893376] raw: 017ffffc00000807 ffff8094df8a7c40 ffff7fe02561a948
0000000000000000
[17890.901111] raw: 0000000000000001 0000000000000000 00000002ffffffff
ffff80082039b080
[17890.908845] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
[17890.915710] page->mem_cgroup:ffff80082039b080
[17890.920065] page allocated via order 0, migratetype Movable, gfp_mask
0x62124a(GFP_NOFS|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL|__GFP_MOVABLE)
[17890.934245]  get_page_from_freelist+0x2d34/0x310c
[17890.938943]  __alloc_pages_nodemask+0x350/0x22d0
[17890.943559]  alloc_pages_current+0x154/0x158
[17890.947821]  __page_cache_alloc+0x274/0x27c
[17890.952002]  __do_page_cache_readahead+0x1e4/0x380
[17890.956785]  ondemand_readahead+0x790/0x97c
[17890.960961]  page_cache_sync_readahead+0x2c8/0x2cc
[17890.965743]  generic_file_buffered_read+0x2b4/0x143c
[17890.970700]  generic_file_read_iter+0x298/0x2e4
[17890.975433]  xfs_file_buffered_aio_read+0x5a0/0x5d0 [xfs]
[17890.981034]  xfs_file_read_iter+0x574/0x580 [xfs]
[17890.985735]  __vfs_read+0x478/0x4e8
[17890.989216]  vfs_read+0xe4/0x1fc
[17890.992436]  kernel_read+0xa8/0x110
[17890.995923]  load_elf_binary+0x92c/0x1b28
[17890.999932]  search_binary_handler+0x138/0x4dc
[17891.004368] page has been migrated, last migrate reason: syscall_or_cpuset
[17891.011294] ------------[ cut here ]------------
[17891.015903] kernel BUG at include/linux/mm.h:546!
[17891.020860] Internal error: Oops - BUG: 0 [#1] SMP
[17891.025645] Modules linked in: thunderx2_pmu ip_tables xfs libcrc32c sd_mod
ahci mlx5_core libahci libata dm_mirror dm_region_hash dm_log dm_mod efivarfs
[17891.039390] CPU: 230 PID: 10606 Comm: bash Kdump: loaded Not tainted
5.0.0-rc3+ #3
[17891.046950] Hardware name: HPE Apollo 70             /C01_APACHE_MB         ,
BIOS L50_5.13_1.0.6 07/10/2018
[17891.056767] pstate: 10400089 (nzcV daIf +PAN -UAO)
[17891.061553] pc : release_pages+0x1e8/0xdbc
[17891.065641] lr : release_pages+0x1e8/0xdbc
[17891.069727] sp : ffff8095580574d0
[17891.073032] x29: ffff8095580574d0 x28: 1fffeffc04afb720
[17891.078336] x27: 0000000000000001 x26: ffff7fe025602848
[17891.083639] x25: 0000000000000000 x24: 0000000000000034
[17891.088942] x23: ffff8095580575a0 x22: ffff80977c3b8400
[17891.094245] x21: ffff7fe02563c7b4 x20: dfff200000000000
[17891.099548] x19: ffff7fe02563c780 x18: 0000000000000000
[17891.104851] x17: 0000000000000000 x16: 0000000000000000
[17891.110153] x15: 0000000000000000 x14: 4f4d5f5046475f5f
[17891.115455] x13: 7c4c4c4157445241 x12: ffff0400026894e1
[17891.120758] x11: 1fffe400026894e1 x10: 5046475f5f7c4e52
[17891.126061] x9 : dfff200000000000 x8 : 737973203a6e6f73
[17891.131364] x7 : 0000000000000000 x6 : ffff20001021cc54
[17891.136666] x5 : 0000000000000000 x4 : 0000000000000000
[17891.141969] x3 : 0000000000000000 x2 : 29c8834f768b6d00
[17891.147275] x1 : 29c8834f768b6d00 x0 : 0000000000000000
[17891.152585] Process bash (pid: 10606, stack limit = 0x0000000036931683)
[17891.159190] Call trace:
[17891.161633]  release_pages+0x1e8/0xdbc
[17891.165379]  free_pages_and_swap_cache+0x60/0x200
[17891.170081]  tlb_flush_mmu_free+0xac/0xe4
[17891.174083]  tlb_flush_mmu+0x22c/0x37c
[17891.177824]  arch_tlb_finish_mmu+0x158/0x260
[17891.182086]  tlb_finish_mmu+0x8c/0xcc
[17891.185741]  exit_mmap+0x268/0x334
[17891.189139]  mmput+0x118/0x2c8
[17891.192187]  flush_old_exec+0x3a8/0x4fc
[17891.196016]  load_elf_binary+0x430/0x1b28
[17891.200019]  search_binary_handler+0x138/0x4dc
[17891.204454]  load_script+0x45c/0x484
[17891.208022]  search_binary_handler+0x138/0x4dc
[17891.212459]  __do_execve_file+0x1144/0x1808
[17891.216634]  do_execve+0x40/0x50
[17891.219855]  __arm64_sys_execve+0x8c/0xa0
[17891.223867]  el0_svc_handler+0x258/0x304
[17891.227785]  el0_svc+0x8/0xc
[17891.230660] Code: 91168021 911d0021 aa1303e0 9401a40a (d4210000)