On Fri, Aug 11, 2023 at 07:15:19AM +0000, Hui Zhu wrote: > From: Hui Zhu <teawater@xxxxxxxxxxxx> > > This version fix the gfp flags in the callers instead of working this > new "bool" flag through the buffer head layers according to the comments > from Matthew Wilcox. FYI, this paragraph should have been below the --- so it gets excluded from the commit log. > Meanwhile, it was observed that the task holding the ext4 journal lock > was blocked for an extended period of time on "shrink_page_list" due to > "ext4_sb_breadahead_unmovable". > 0 [] __schedule at xxxxxxxxxxxxxxx > 1 [] _cond_resched at xxxxxxxxxxxxxxx > 2 [] shrink_page_list at xxxxxxxxxxxxxxx > 3 [] shrink_inactive_list at xxxxxxxxxxxxxxx > 4 [] shrink_lruvec at xxxxxxxxxxxxxxx > 5 [] shrink_node_memcgs at xxxxxxxxxxxxxxx > 6 [] shrink_node at xxxxxxxxxxxxxxx > 7 [] shrink_zones at xxxxxxxxxxxxxxx > 8 [] do_try_to_free_pages at xxxxxxxxxxxxxxx > 9 [] try_to_free_mem_cgroup_pages at xxxxxxxxxxxxxxx > 10 [] try_charge at xxxxxxxxxxxxxxx > 11 [] mem_cgroup_charge at xxxxxxxxxxxxxxx > 12 [] __add_to_page_cache_locked at xxxxxxxxxxxxxxx > 13 [] add_to_page_cache_lru at xxxxxxxxxxxxxxx > 14 [] pagecache_get_page at xxxxxxxxxxxxxxx > 15 [] grow_dev_page at xxxxxxxxxxxxxxx After applying your patch, we'd still get into trouble with folio_alloc_buffers() also specifying __GFP_NOWAIT. So I decided to pass the GFP flags into folio_alloc_buffers() -- see the patch series I just sent out. > @@ -1050,18 +1051,27 @@ grow_dev_page(struct block_device *bdev, sector_t block, > int ret = 0; > gfp_t gfp_mask; > > - gfp_mask = mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS) | gfp; > + gfp_mask = mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS); > + if (gfp == ~__GFP_DIRECT_RECLAIM) > + gfp_mask &= ~__GFP_DIRECT_RECLAIM; This isn't how we normally use gfp_mask. OTOH, how buffer.c uses GFP masks is also a bit weird. The bdev_getblk() I just added is more normal. Please try the patchset I cc'd you on (with the __GFP_ACCOUNT added); I'm currently running it through xfstests and it's holding up fine. I suppose I should play around with memcgs to try to make it happen a bit more often.