On Mon 19-09-22 11:00:55, Zhenhua Huang wrote: > When a driver was continuously allocating order 3 > pages, it would be very easily OOM even there were lots of reclaimable > pages. A test module is used to reproduce this issue, > several key ftrace events are as below: > > insmod-6968 [005] .... 321.306007: reclaim_retry_zone: node=0 zone=Normal > order=3 reclaimable=539988 available=592856 min_wmark=21227 no_progress_loops=0 > wmark_check=0 > insmod-6968 [005] .... 321.306009: compact_retry: order=3 > priority=COMPACT_PRIO_SYNC_LIGHT compaction_result=withdrawn retries=0 > max_retries=16 should_retry=1 > insmod-6968 [004] .... 321.308220: > mm_compaction_try_to_compact_pages: order=3 gfp_mask=GFP_KERNEL priority=0 > insmod-6968 [004] .... 321.308964: mm_compaction_end: > zone_start=0x80000 migrate_pfn=0xaa800 free_pfn=0x80800 zone_end=0x940000, > mode=sync status=complete > insmod-6968 [004] .... 321.308971: reclaim_retry_zone: node=0 > zone=Normal order=3 reclaimable=539830 available=592776 min_wmark=21227 > no_progress_loops=0 wmark_check=0 > insmod-6968 [004] .... 321.308973: compact_retry: order=3 > priority=COMPACT_PRIO_SYNC_FULL compaction_result=failed retries=0 > max_retries=16 should_retry=0 > > There're ~2GB reclaimable pages(reclaimable=539988) but VM decides not to > reclaim any more: > insmod-6968 [005] .... 321.306007: reclaim_retry_zone: node=0 zone=Normal > order=3 reclaimable=539988 available=592856 min_wmark=21227 no_progress_loops=0 > wmark_check=0 > > >From meminfo when oom, there was NO qualified order >= 3 pages(CMA page not qualified) > can meet should_reclaim_retry's requirement: > Normal : 24671*4kB (UMEC) 13807*8kB (UMEC) 8214*16kB (UEC) 190*32kB (C) > 94*64kB (C) 28*128kB (C) 16*256kB (C) 7*512kB (C) 5*1024kB (C) 7*2048kB (C) > 46*4096kB (C) = 571796kB > > The reason of should_reclaim_retry early aborting was that is based on having the order > pages in its free_list. For order 3 pages, that's easily fragmented. Considering enough free > pages are the fundamental of compaction. It may not be suitable to stop reclaiming > when lots of page cache there. Relax order by one to fix this issue. For the higher order request we rely on should_compact_retry which backs on based on the compaction feedback. I would recommend looking why the compaction fails. Also this patch doesn't really explain why it should work and honestly it doesn't really make much sense to me either. > With the change meminfo output when first OOM showing page cache was nearly > exhausted: > > Normal free: 462956kB min:8644kB low:44672kB high:50844kB > reserved_highatomic:4096KB active_anon:48kB inactive_anon:12kB > active_file:508kB inactive_file:552kB unevictable:109016kB writepending:160kB > present:7111680kB managed:6175004kB mlocked:107784kB pagetables:78732kB > bounce:0kB free_pcp:996kB local_pcp:0kB free_cma:376412kB > > Signed-off-by: Zhenhua Huang <quic_zhenhuah@xxxxxxxxxxx> > --- > mm/page_alloc.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 36b2021..b4ca6d1 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4954,8 +4954,11 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, > /* > * Would the allocation succeed if we reclaimed all > * reclaimable pages? > + * considering fragmentation, enough free pages are the > + * fundamental of compaction: > + * lower the order requirement by one > */ > - wmark = __zone_watermark_ok(zone, order, min_wmark, > + wmark = __zone_watermark_ok(zone, order ? order - 1 : 0, min_wmark, > ac->highest_zoneidx, alloc_flags, available); > trace_reclaim_retry_zone(z, order, reclaimable, > available, min_wmark, *no_progress_loops, wmark); > -- > 2.7.4 -- Michal Hocko SUSE Labs