On Fri, 26 Apr 2024 19:29:38 +0800 Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote: > See commit f2c817bed58d ("mm: use memalloc_nofs_save in readahead > path"), ensure that page_cache_ra_order() do not attempt to reclaim > file-backed pages too, or it leads to a deadlock, found issue when > test ext4 large folio. > > INFO: task DataXceiver for:7494 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:DataXceiver for state:D stack:0 pid:7494 ppid:1 flags:0x00000200 > Call trace: > __switch_to+0x14c/0x240 > __schedule+0x82c/0xdd0 > schedule+0x58/0xf0 > io_schedule+0x24/0xa0 > __folio_lock+0x130/0x300 > migrate_pages_batch+0x378/0x918 > migrate_pages+0x350/0x700 > compact_zone+0x63c/0xb38 > compact_zone_order+0xc0/0x118 > try_to_compact_pages+0xb0/0x280 > __alloc_pages_direct_compact+0x98/0x248 > __alloc_pages+0x510/0x1110 > alloc_pages+0x9c/0x130 > folio_alloc+0x20/0x78 > filemap_alloc_folio+0x8c/0x1b0 > page_cache_ra_order+0x174/0x308 > ondemand_readahead+0x1c8/0x2b8 > page_cache_async_ra+0x68/0xb8 > filemap_readahead.isra.0+0x64/0xa8 > filemap_get_pages+0x3fc/0x5b0 > filemap_splice_read+0xf4/0x280 > ext4_file_splice_read+0x2c/0x48 [ext4] > vfs_splice_read.part.0+0xa8/0x118 > splice_direct_to_actor+0xbc/0x288 > do_splice_direct+0x9c/0x108 > do_sendfile+0x328/0x468 > __arm64_sys_sendfile64+0x8c/0x148 > invoke_syscall+0x4c/0x118 > el0_svc_common.constprop.0+0xc8/0xf0 > do_el0_svc+0x24/0x38 > el0_svc+0x4c/0x1f8 > el0t_64_sync_handler+0xc0/0xc8 > el0t_64_sync+0x188/0x190 > > Cc: zhangyi (F) <yi.zhang@xxxxxxxxxx> > Signed-off-by: Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> I'm thinking Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Cc: stable > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -494,6 +494,7 @@ void page_cache_ra_order(struct readahead_control *ractl, > pgoff_t index = readahead_index(ractl); > pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; > pgoff_t mark = index + ra->size - ra->async_size; > + unsigned int nofs; > int err = 0; > gfp_t gfp = readahead_gfp_mask(mapping); > > @@ -508,6 +509,8 @@ void page_cache_ra_order(struct readahead_control *ractl, > new_order = min_t(unsigned int, new_order, ilog2(ra->size)); > } > > + /* See comment in page_cache_ra_unbounded() */ > + nofs = memalloc_nofs_save(); > filemap_invalidate_lock_shared(mapping); > while (index <= limit) { > unsigned int order = new_order; > @@ -531,6 +534,7 @@ void page_cache_ra_order(struct readahead_control *ractl, > > read_pages(ractl); > filemap_invalidate_unlock_shared(mapping); > + memalloc_nofs_restore(nofs); > > /* > * If there were already pages in the page cache, then we may have > -- > 2.41.0