On Thu, Mar 28, 2019 at 09:43:29PM -0400, Qian Cai wrote: > On 3/23/19 11:04 PM, Matthew Wilcox wrote> @@ -335,11 +335,12 @@ static inline > struct page *grab_cache_page_nowait(struct address_space *mapping, > > > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > > { > > + unsigned long index = page_index(page); > > + > > VM_BUG_ON_PAGE(PageTail(page), page); > > - VM_BUG_ON_PAGE(page->index > offset, page); > > - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset, > > - page); > > - return page - page->index + offset; > > + VM_BUG_ON_PAGE(index > offset, page); > > + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > > + return page - index + offset; > > } > > Even with this patch, it is still able to trigger a panic below by running LTP > mm tests. Always triggered by oom02 (or oom04) at the end. > > # /opt/ltp/runltp -f mm > > The problem is that in scan_swap_map_slots(), > > /* reuse swap entry of cache-only swap if not busy. */ > if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) { > int swap_was_freed; > unlock_cluster(ci); > spin_unlock(&si->lock); > swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); > > but that swap entry has already been freed, and the page has PageSwapCache > cleared and page->private is 0. I don't understand how we get to this situation. We SetPageSwapCache() in add_to_swap_cache() right before we store the page in i_pages. We ClearPageSwapCache() in __delete_from_swap_cache() right after removing the page from the array. So how do we find a page in a swap address space that has PageSwapCache cleared? Indeed, we have a check which should trigger ... VM_BUG_ON_PAGE(!PageSwapCache(page), page); in __delete_from_swap_cache(). Oh ... is it a race? * Its ok to check for PageSwapCache without the page lock * here because we are going to recheck again inside * try_to_free_swap() _with_ the lock. so CPU A does: page = find_get_page(swap_address_space(entry), offset) page = find_subpage(page, offset); trylock_page(page); while CPU B does: xa_lock_irq(&address_space->i_pages); __delete_from_swap_cache(page, entry); xas_store(&xas, NULL); ClearPageSwapCache(page); xa_unlock_irq(&address_space->i_pages); and if the ClearPageSwapCache happens between the xas_load() and the find_subpage(), we're stuffed. CPU A has a reference to the page, but not a lock, and find_get_page is running under RCU. I suppose we could fix this by taking the i_pages xa_lock around the call to find_get_pages(). If indeed, that's what this problem is. Want to try this patch? diff --git a/mm/swapfile.c b/mm/swapfile.c index 2b8d9c3fbb47..ed8e42be88b5 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -127,10 +127,14 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, unsigned long offset, unsigned long flags) { swp_entry_t entry = swp_entry(si->type, offset); + struct address_space *mapping = swap_address_space(entry); + unsigned long irq_flags; struct page *page; int ret = 0; - page = find_get_page(swap_address_space(entry), offset); + xa_lock_irqsave(&mapping->i_pages, irq_flags); + page = find_get_page(mapping, offset); + xa_unlock_irqrestore(&mapping->i_pages, irq_flags); if (!page) return 0; /* > swp_entry_t entry = swp_entry(si->type, offset) > > and then in find_subpage(), > > its page->index has a different meaning again and the calculation is now all wrong. > > return page - page->index + offset; > > [ 7439.033573] oom_reaper: reaped process 47172 (oom02), now anon-rss:0kB, > file-rss:0kB, shmem-rss:0kB > [ 7456.445737] LTP: starting oom03 > [ 7456.535940] LTP: starting oom04 > [ 7493.077222] page:ffffea00877a13c0 count:1 mapcount:0 mapping:ffff88a79061d009 > index:0x7fa81584f > [ 7493.086963] anon > [ 7493.086968] flags: 0x15fffe00008005c(uptodate|dirty|lru|workingset|swapbacked) > [ 7493.097201] raw: 015fffe00008005c ffffea00b4bf9508 ffffea007f45efc8 > ffff88a79061d009 > [ 7493.105853] raw: 00000007fa81584f 0000000000000000 00000001ffffffff > ffff888f18278008 > [ 7493.114504] page dumped because: VM_BUG_ON_PAGE(index + (1 << > compound_order(page)) <= offset) > [ 7493.124126] page->mem_cgroup:ffff888f18278008 > [ 7493.129036] page_owner info is not active (free page?) > [ 7493.134782] ------------[ cut here ]------------ > [ 7493.139937] kernel BUG at include/linux/pagemap.h:342! > [ 7493.145682] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI > [ 7493.152679] CPU: 5 PID: 47308 Comm: oom04 Kdump: loaded Tainted: G W > 5.1.0-rc2-mm1+ #13 > [ 7493.163068] Hardware name: Lenovo ThinkSystem SR530 > -[7X07RCZ000]-/-[7X07RCZ000]-, BIOS -[TEE113T-1.00]- 07/07/2017 > [ 7493.174721] RIP: 0010:find_get_entry+0x751/0x9b0 > [ 7493.179876] Code: c6 e0 aa a9 8d 4c 89 ff e8 3c 18 0d 00 0f 0b 48 c7 c7 20 40 > 02 8e e8 a3 17 58 00 48 c7 c6 40 ad a9 8d 4c 89 ff e8 1f 18 0d 00 <0f> 0b 48 c7 > c7 e0 3f 02 8e e8 86 17 58 00 48 c7 c7 68 11 3f 8e e8 > [ 7493.200834] RSP: 0000:ffff888d50536ba8 EFLAGS: 00010282 > [ 7493.206666] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff8cd6401e > [ 7493.214632] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88979e8b5480 > [ 7493.222599] RBP: ffff888d50536cb8 R08: ffffed12f3d16a91 R09: ffffed12f3d16a90 > [ 7493.230566] R10: ffffed12f3d16a90 R11: ffff88979e8b5487 R12: ffffea00877a13c0 > [ 7493.238531] R13: ffffea00877a13c8 R14: ffffea00877a13c8 R15: ffffea00877a13c0 > [ 7493.246496] FS: 00007f248398c700(0000) GS:ffff88979e880000(0000) > knlGS:0000000000000000 > [ 7493.255527] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 7493.261942] CR2: 00007f3fde110000 CR3: 00000011b2fcc003 CR4: 00000000001606a0 > [ 7493.269900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 7493.277864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 7493.285830] Call Trace: > [ 7493.288555] ? queued_spin_lock_slowpath+0x571/0x9e0 > [ 7493.294097] ? __filemap_set_wb_err+0x1f0/0x1f0 > [ 7493.299154] pagecache_get_page+0x4a/0xb70 > [ 7493.303729] __try_to_reclaim_swap+0xa3/0x400 > [ 7493.308593] scan_swap_map_slots+0xc05/0x1850 > [ 7493.313447] ? __try_to_reclaim_swap+0x400/0x400 > [ 7493.318600] ? do_raw_spin_lock+0x128/0x280 > [ 7493.323269] ? rwlock_bug.part.0+0x90/0x90 > [ 7493.327840] ? get_swap_pages+0x195/0x730 > [ 7493.332316] get_swap_pages+0x386/0x730 > [ 7493.336590] get_swap_page+0x2b2/0x643 > [ 7493.340774] ? rmap_walk+0x140/0x140 > [ 7493.344765] ? free_swap_slot+0x3c0/0x3c0 > [ 7493.349232] ? anon_vma_ctor+0xe0/0xe0 > [ 7493.353407] ? page_get_anon_vma+0x280/0x280 > [ 7493.358173] add_to_swap+0x10b/0x230 > [ 7493.362164] shrink_page_list+0x29d8/0x4960 > [ 7493.366822] ? page_evictable+0x11b/0x1d0 > [ 7493.371296] ? page_evictable+0x1d0/0x1d0 > [ 7493.375769] ? __isolate_lru_page+0x880/0x880 > [ 7493.380631] ? __lock_acquire.isra.14+0x7d7/0x2130 > [ 7493.385977] ? shrink_inactive_list+0x484/0x13b0 > [ 7493.391130] ? lock_downgrade+0x760/0x760 > [ 7493.395608] ? kasan_check_read+0x11/0x20 > [ 7493.400082] ? do_raw_spin_unlock+0x59/0x250 > [ 7493.404848] shrink_inactive_list+0x4bf/0x13b0 > [ 7493.409823] ? move_pages_to_lru+0x1c90/0x1c90 > [ 7493.414795] ? kasan_check_read+0x11/0x20 > [ 7493.419261] ? lruvec_lru_size+0xef/0x4c0 > [ 7493.423738] ? call_function_interrupt+0xa/0x20 > [ 7493.428800] ? rcu_all_qs+0x11/0xc0 > [ 7493.432692] shrink_node_memcg+0x66a/0x1ee0 > [ 7493.437361] ? shrink_active_list+0x1150/0x1150 > [ 7493.442417] ? lock_downgrade+0x760/0x760 > [ 7493.446891] ? lock_acquire+0x169/0x360 > [ 7493.451177] ? mem_cgroup_iter+0x210/0xca0 > [ 7493.455747] ? kasan_check_read+0x11/0x20 > [ 7493.460221] ? mem_cgroup_protected+0x94/0x450 > [ 7493.465179] shrink_node+0x266/0x13c0 > [ 7493.469267] ? shrink_node_memcg+0x1ee0/0x1ee0 > [ 7493.474230] ? ktime_get+0xab/0x140 > [ 7493.478122] ? zone_reclaimable_pages+0x553/0x8d0 > [ 7493.483371] do_try_to_free_pages+0x349/0x11e0 > [ 7493.488333] ? allow_direct_reclaim.part.6+0xc3/0x240 > [ 7493.493971] ? shrink_node+0x13c0/0x13c0 > [ 7493.498352] ? queue_delayed_work_on+0x30/0x30 > [ 7493.503313] try_to_free_pages+0x277/0x740 > [ 7493.507884] ? __lock_acquire.isra.14+0x7d7/0x2130 > [ 7493.513232] ? do_try_to_free_pages+0x11e0/0x11e0 > [ 7493.518482] __alloc_pages_nodemask+0xc37/0x2ab0 > [ 7493.523635] ? gfp_pfmemalloc_allowed+0x150/0x150 > [ 7493.528886] ? __lock_acquire.isra.14+0x7d7/0x2130 > [ 7493.534226] ? __lock_acquire.isra.14+0x7d7/0x2130 > [ 7493.539566] ? do_anonymous_page+0x450/0x1e00 > [ 7493.544419] ? lock_downgrade+0x760/0x760 > [ 7493.548896] ? __lru_cache_add+0xc2/0x240 > [ 7493.553372] alloc_pages_vma+0xb2/0x430 > [ 7493.557652] do_anonymous_page+0x50a/0x1e00 > [ 7493.562324] ? put_prev_task_fair+0x27c/0x720 > [ 7493.567189] ? finish_fault+0x290/0x290 > [ 7493.571471] __handle_mm_fault+0x1688/0x3bc0 > [ 7493.576227] ? __lock_acquire.isra.14+0x7d7/0x2130 > [ 7493.581574] ? vmf_insert_mixed_mkwrite+0x20/0x20 > [ 7493.586824] handle_mm_fault+0x326/0x6cf > [ 7493.591203] __do_page_fault+0x333/0x8d0 > [ 7493.595571] do_page_fault+0x75/0x48e > [ 7493.599660] ? page_fault+0x5/0x20 > [ 7493.603458] page_fault+0x1b/0x20 > [ 7493.607156] RIP: 0033:0x410930 > [ 7493.610564] Code: 89 de e8 53 26 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 > 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 05 2c ff ff 31 d2 48 98 90 <c6> 44 15 00 > 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f > [ 7493.631521] RSP: 002b:00007f248398bec0 EFLAGS: 00010206 > [ 7493.637352] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f42982ad497 > [ 7493.645316] RDX: 000000002a9a3000 RSI: 00000000c0000000 RDI: 0000000000000000 > [ 7493.653281] RBP: 00007f230298b000 R08: 00000000ffffffff R09: 0000000000000000 > [ 7493.661245] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001 > [ 7493.669209] R13: 00007ffc5b8a54ef R14: 0000000000000000 R15: 00007f248398bfc0 > [ 7493.677176] Modules linked in: brd ext4 crc16 mbcache jbd2 overlay loop > nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables > x_tables xfs sd_mod i40e ahci libahci megaraid_sas libata dm_mirror > dm_region_hash dm_log dm_mod efivarfs