In low memory situations(Specifically in docker),xfs_vm_readpages path might declare memcg oom during fs pagefault and kill applications. This patch extends the commit 8a5c743e308d ("mm, memcg: use consistent gfp flags during readahead") to include XFS by modifying its readahead path to use readahead_gfp_mask.Specifically, the gfp_mask logic in xfs_vm_readpages and related functions is now aligned with readahead_gfp_mask to ensure consistent behavior during readahead. This prevents potential OOMs caused by discrepancies in gfp_mask handling. Test Results: run docker:docker container run --name wget.100m.ky -d --memory 104857600 --memory-swap 104857600; docker : wget http://172.17.0.1/testfile(2G largely file) Before the fix: printk:try_to_free_mem_cgroup_pages's parameters: gfp_mask=0x62004a (GFP_NOFS|__GFP_HIGHMEM |__GFP_HARDWALL|__GFP_MOVABLE) and return value:nr_reclaimed: 0 [ 153.390196] CPU: 1 PID: 5405 Comm: wget Kdump: loaded Not tainted 4.19.90-25+ #24 [ 153.390197] Hardware name: American Megatrends Inc. To be filled by O.E.M./To be filled by O.E.M., BIOS ITSW3001 09/14/2020 [ 153.390197] Call Trace: [ 153.390199] dump_stack+0x64/0x88 [ 153.390200] try_to_free_mem_cgroup_pages.cold+0x30/0x3e [ 153.390201] try_charge+0x2d9/0x7a0 [ 153.390202] ? memcg_check_events+0xdd/0x250 [ 153.390203] mem_cgroup_try_charge+0x8b/0x180 [ 153.390204] __add_to_page_cache_locked+0x64/0x240 [ 153.390205] add_to_page_cache_lru+0x48/0xe0 [ 153.390206] iomap_readpages_actor+0x10e/0x240 [ 153.390207] iomap_apply+0xc3/0x130 [ 153.390208] ? iomap_write_begin.constprop.0+0x310/0x310 [ 153.390209] iomap_readpages+0xa4/0x190 [ 153.390210] ? iomap_write_begin.constprop.0+0x310/0x310 [ 153.390211] read_pages.isra.0+0x72/0x190 [ 153.390212] __do_page_cache_readahead+0x1b2/0x1d0 [ 153.390214] filemap_fault+0x2d6/0x570 [ 153.390235] __xfs_filemap_fault+0x6b/0x200 [xfs] [ 153.390236] __do_fault+0x38/0x120 [ 153.390237] do_fault+0x119/0x3e0 [ 153.390238] __handle_mm_fault+0x455/0x5d0 [ 153.390239] handle_mm_fault+0x90/0x1b0 [ 153.390240] __do_page_fault+0x2ea/0x540 [ 153.390242] do_page_fault+0x33/0x120 [ 153.390243] ? page_fault+0x8/0x30 [ 153.390243] page_fault+0x1e/0x30 [ 153.390244] RIP: 0033:0x7f5404794514 [ 153.390246] Code: Bad RIP value. [ 153.390246] RSP: 002b:00007fff244f0728 EFLAGS: 00010246 [ 153.390246] RAX: 0000000000001000 RBX: 0000000000001000 RCX: 00007f5404794514 [ 153.390247] RDX: 0000000000001000 RSI: 000055ef7f87e640 RDI: 0000000000000004 [ 153.390247] RBP: 000055ef7f87e640 R08: 0000000000000000 R09: 000055ef7f87e670 [ 153.390248] R10: 000055ef7f87e620 R11: 0000000000000246 R12: 000055ef7f879d80 [ 153.390248] R13: 0000000000001000 R14: 00007f540485d7c0 R15: 0000000000001000 [ 153.390257] wget invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), nodemask=(null), order=0, oom_score_adj=0 [ 153.390257] wget cpuset=bae816dd30bd6e193684d5580f57fd54df29c0a695dec5b7606931d248c18dd2 mems_allowed=0 wget downloads a 2G file and oom kills the process almost every time After the fix: printk:try_to_free_mem_cgroup_pages's parameters: gfp_mask=0x62124a (GFP_NOFS|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_NORETRY| __GFP_HARDWALL|__GFP_MOVABLE) and return value: nr_reclaimed: 55 [ 196.970857] CPU: 9 PID: 5326 Comm: wget Kdump: loaded Not tainted 4.19.90-25+ #23 [ 196.970858] Hardware name: American Megatrends Inc. To be filled by O.E.M./To be filled by O.E.M., BIOS ITSW3001 09/14/2020 [ 196.970858] Call Trace: [ 196.970860] dump_stack+0x64/0x88 [ 196.970861] try_to_free_mem_cgroup_pages.cold+0x30/0x3e [ 196.970862] try_charge+0x2d9/0x7a0 [ 196.970863] ? memcg_check_events+0xdd/0x250 [ 196.970865] mem_cgroup_try_charge+0x8b/0x180 [ 196.970865] __add_to_page_cache_locked+0x64/0x240 [ 196.970866] add_to_page_cache_lru+0x48/0xe0 [ 196.970868] iomap_readpages_actor+0x125/0x250 [ 196.970869] iomap_apply+0xc3/0x130 [ 196.970870] ? iomap_write_begin.constprop.0+0x310/0x310 [ 196.970871] iomap_readpages+0xa4/0x190 [ 196.970872] ? iomap_write_begin.constprop.0+0x310/0x310 [ 196.970873] read_pages.isra.0+0x72/0x190 [ 196.970875] __do_page_cache_readahead+0x160/0x1d0 [ 196.970876] filemap_fault+0x2d6/0x570 [ 196.970897] __xfs_filemap_fault+0x6b/0x200 [xfs] [ 196.970899] __do_fault+0x38/0x120 [ 196.970900] do_fault+0x119/0x3e0 [ 196.970901] __handle_mm_fault+0x455/0x5d0 [ 196.970903] handle_mm_fault+0x90/0x1b0 [ 196.970905] __do_page_fault+0x2ea/0x540 [ 196.970906] do_page_fault+0x33/0x120 [ 196.970907] ? page_fault+0x8/0x30 [ 196.970908] page_fault+0x1e/0x30 [ 196.970909] RIP: 0033:0x7fed5d34b340 [ 196.970911] Code: Bad RIP value. [ 196.970912] RSP: 002b:00007ffcf231fd68 EFLAGS: 00010246 [ 196.970913] RAX: 0000000000000000 RBX: 000055f860649030 RCX: 00000000061a9000 [ 196.970913] RDX: 000055f860664980 RSI: 0000000000000000 RDI: 000055f860649030 [ 196.970913] RBP: 000000000000003b R08: 7fffffffffffffff R09: 7ffffffff9e58fff [ 196.970914] R10: 000055f860667620 R11: 0000000000000246 R12: 00000000061a9000 [ 196.970914] R13: 0000000000000000 R14: 000055f860664b50 R15: 000055f860664980 wget downloads a 2G file and is tested 500 times without being killed Fixes: 8a5c743e308d ("mm, memcg: use consistent gfp flags during readahead") Signed-off-by: Hu Song <husong@xxxxxxxxxx> --- fs/iomap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/iomap.c b/fs/iomap.c index 04e82b6bd9bf..a34e4ec874f0 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -424,6 +424,7 @@ static struct page * iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos, loff_t length, loff_t *done) { + gfp_t gfp_mask = readahead_gfp_mask(inode->i_mapping); while (!list_empty(pages)) { struct page *page = lru_to_page(pages); @@ -432,7 +433,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos, list_del(&page->lru); if (!add_to_page_cache_lru(page, inode->i_mapping, page->index, - GFP_NOFS)) + gfp_mask | GFP_NOFS)) return page; /* -- 2.25.1