Re: [RESEND PATCH] mm:page_alloc.c: lower the order requirement of should_reclaim_retry

Zhenhua Huang <quic_zhenhuah@xxxxxxxxxxx> · Mon, 19 Sep 2022 19:24:32 +0800

Thanks Michal for comments!

On 2022/9/19 16:14, Michal Hocko wrote:
On Mon 19-09-22 11:00:55, Zhenhua Huang wrote:
When a driver was continuously allocating order 3
pages, it would be very easily OOM even there were lots of reclaimable
pages. A test module is used to reproduce this issue,
several key ftrace events are as below:

           insmod-6968    [005] ....   321.306007: reclaim_retry_zone: node=0 zone=Normal
order=3 reclaimable=539988 available=592856 min_wmark=21227 no_progress_loops=0
wmark_check=0
           insmod-6968    [005] ....   321.306009: compact_retry: order=3
priority=COMPACT_PRIO_SYNC_LIGHT compaction_result=withdrawn retries=0
max_retries=16 should_retry=1
           insmod-6968    [004] ....   321.308220:
mm_compaction_try_to_compact_pages: order=3 gfp_mask=GFP_KERNEL priority=0
           insmod-6968    [004] ....   321.308964: mm_compaction_end:
zone_start=0x80000 migrate_pfn=0xaa800 free_pfn=0x80800 zone_end=0x940000,
mode=sync status=complete
           insmod-6968    [004] ....   321.308971: reclaim_retry_zone: node=0
zone=Normal   order=3 reclaimable=539830 available=592776 min_wmark=21227
no_progress_loops=0 wmark_check=0
           insmod-6968    [004] ....   321.308973: compact_retry: order=3
priority=COMPACT_PRIO_SYNC_FULL compaction_result=failed retries=0
max_retries=16 should_retry=0

There're ~2GB reclaimable pages(reclaimable=539988) but VM decides not to
reclaim any more:
insmod-6968    [005] ....   321.306007: reclaim_retry_zone: node=0 zone=Normal
order=3 reclaimable=539988 available=592856 min_wmark=21227 no_progress_loops=0
wmark_check=0

>From meminfo when oom, there was NO qualified order >= 3 pages(CMA page not qualified)
can meet should_reclaim_retry's requirement:
Normal  : 24671*4kB (UMEC) 13807*8kB (UMEC) 8214*16kB (UEC) 190*32kB (C)
94*64kB (C) 28*128kB (C) 16*256kB (C) 7*512kB (C) 5*1024kB (C) 7*2048kB (C)
46*4096kB (C) = 571796kB

The reason of should_reclaim_retry early aborting was that is based on having the order
pages in its free_list. For order 3 pages, that's easily fragmented. Considering enough free
pages are the fundamental of compaction. It may not be suitable to stop reclaiming
when lots of page cache there. Relax order by one to fix this issue.

For the higher order request we rely on should_compact_retry which backs
on based on the compaction feedback. I would recommend looking why the
compaction fails.
I think the reason of compaction failure is there're not enough free 
pages. Like in ftrace events showed, free pages(which include CMA) was 
only 592856 - 539988 = 52868 pages(reclaimable=539988 available=592856).

There are some restrictions like suitable_migration_target() for free 
pages and suitable_migration_source() for movable pages. Hence eligible 
targets is fewer.


Also this patch doesn't really explain why it should work and honestly
it doesn't really make much sense to me either.
Sorry, my fault. IMO, The reason it should work is, say for this case of 
order 3 allocation: we can perform direct reclaim more times as we have 
only order 2 pages(which *lowered* by this change) in 
free_list(8214*16kB (UEC)). The order requirement which I have lowered 
is should_reclaim_retry -> __zone_watermark_ok:
       for (o = order; o < MAX_ORDER; o++) {
                struct free_area *area = &z->free_area[o];
...
                for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
                        if (!free_area_empty(area, mt))
                                return true;
                }

Order 2 pages can be more easily met, hence VM has more chance to return 
true from should_reclaim_retry.



With the change meminfo output when first OOM showing page cache was nearly
exhausted:

Normal free: 462956kB min:8644kB low:44672kB high:50844kB
reserved_highatomic:4096KB active_anon:48kB inactive_anon:12kB
active_file:508kB inactive_file:552kB unevictable:109016kB writepending:160kB
present:7111680kB managed:6175004kB mlocked:107784kB pagetables:78732kB
bounce:0kB free_pcp:996kB local_pcp:0kB free_cma:376412kB

Signed-off-by: Zhenhua Huang <quic_zhenhuah@xxxxxxxxxxx>
---
  mm/page_alloc.c | 5 ++++-
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 36b2021..b4ca6d1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4954,8 +4954,11 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
  		/*
  		 * Would the allocation succeed if we reclaimed all
  		 * reclaimable pages?
+		 * considering fragmentation, enough free pages are the
+		 * fundamental of compaction:
+		 * lower the order requirement by one
  		 */
-		wmark = __zone_watermark_ok(zone, order, min_wmark,
+		wmark = __zone_watermark_ok(zone, order ? order - 1 : 0, min_wmark,
  				ac->highest_zoneidx, alloc_flags, available);
  		trace_reclaim_retry_zone(z, order, reclaimable,
  				available, min_wmark, *no_progress_loops, wmark);
--
2.7.4


BTW, regarding the test case:
I used order 3 with GFP_KERNEL flag to continuously allocate to 
reproduce it. If you have other suggestions/ideas, please let me know. 
Thank you.

Thanks,
Zhenhua