Hello Vlastimil 在 2015/1/19 18:05, Vlastimil Babka 写道: > Compaction employs two page scanners - migration scanner isolates pages to be > the source of migration, free page scanner isolates pages to be the target of > migration. Currently, migration scanner starts at the zone's first pageblock > and progresses towards the last one. Free scanner starts at the last pageblock > and progresses towards the first one. Within a pageblock, each scanner scans > pages from the first to the last one. When the scanners meet within the same > pageblock, compaction terminates. > > One consequence of the current scheme, that turns out to be unfortunate, is > that the migration scanner does not encounter the pageblocks which were > scanned by the free scanner. In a test with stress-highalloc from mmtests, > the scanners were observed to meet around the middle of the zone in first two > phases (with background memory pressure) of the test when executed after fresh > reboot. On further executions without reboot, the meeting point shifts to > roughly third of the zone, and compaction activity as well as allocation > success rates deteriorates compared to the run after fresh reboot. > > It turns out that the deterioration is indeed due to the migration scanner > processing only a small part of the zone. Compaction also keeps making this > bias worse by its activity - by moving all migratable pages towards end of the > zone, the free scanner has to scan a lot of full pageblocks to find more free > pages. The beginning of the zone contains pageblocks that have been compacted > as much as possible, but the free pages there cannot be further merged into > larger orders due to unmovable pages. The rest of the zone might contain more > suitable pageblocks, but the migration scanner will not reach them. It also > isn't be able to move movable pages out of unmovable pageblocks there, which > affects fragmentation. > > This patch is the first step to remove this bias. It allows the compaction > scanners to start at arbitrary pfn (aligned to pageblock for practical > purposes), called pivot, within the zone. The migration scanner starts at the > exact pfn, the free scanner starts at the pageblock preceding the pivot. The > direction of scanning is unaffected, but when the migration scanner reaches > the last pageblock of the zone, or the free scanner reaches the first > pageblock, they wrap and continue with the first or last pageblock, > respectively. Compaction terminates when any of the scanners wrap and both > meet within the same pageblock. > > For easier bisection of potential regressions, this patch always uses the > first zone's pfn as the pivot. That means the free scanner immediately wraps > to the last pageblock and the operation of scanners is thus unchanged. The > actual pivot changing is done by the next patch. > > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> I read through the whole patch, and you can feel free to add: Acked-by: Zhang Yanfei <zhangyanfei@xxxxxxxxxxxxxx> I agree with you and the approach to improve the current scheme. One thing I think should be carefully treated is how to avoid migrating back and forth since the pivot pfn can be changed. I see patch 5 has introduced a policy to change the pivot so we can have a careful observation on it. (The changes in the patch make the code more difficult to understand now... and I just find a tiny mistake, please see below) > Cc: Minchan Kim <minchan@xxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxx> > Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > Cc: Michal Nazarewicz <mina86@xxxxxxxxxx> > Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> > Cc: Christoph Lameter <cl@xxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > --- > include/linux/mmzone.h | 2 + > mm/compaction.c | 204 +++++++++++++++++++++++++++++++++++++++++++------ > mm/internal.h | 1 + > 3 files changed, 182 insertions(+), 25 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 2f0856d..47aa181 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -503,6 +503,8 @@ struct zone { > unsigned long percpu_drift_mark; > > #if defined CONFIG_COMPACTION || defined CONFIG_CMA > + /* pfn where compaction scanners have initially started last time */ > + unsigned long compact_cached_pivot_pfn; > /* pfn where compaction free scanner should start */ > unsigned long compact_cached_free_pfn; > /* pfn where async and sync compaction migration scanner should start */ > diff --git a/mm/compaction.c b/mm/compaction.c > index 5626220..abae89a 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -123,11 +123,16 @@ static inline bool isolation_suitable(struct compact_control *cc, > return !get_pageblock_skip(page); > } > > +/* > + * Invalidate cached compaction scanner positions, so that compact_zone() > + * will reinitialize them on the next compaction. > + */ > static void reset_cached_positions(struct zone *zone) > { > - zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn; > - zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn; > - zone->compact_cached_free_pfn = zone_end_pfn(zone); > + /* Invalid values are re-initialized in compact_zone */ > + zone->compact_cached_migrate_pfn[0] = 0; > + zone->compact_cached_migrate_pfn[1] = 0; > + zone->compact_cached_free_pfn = 0; > } > > /* > @@ -172,11 +177,35 @@ void reset_isolation_suitable(pg_data_t *pgdat) > /* Only flush if a full compaction finished recently */ > if (zone->compact_blockskip_flush) { > __reset_isolation_suitable(zone); > - reset_cached_positions(zone); > + reset_cached_positions(zone, false); The second argument should be in patch 5 instead of here, right? ^-^ Thanks. > } > } > } > > +static void update_cached_migrate_pfn(unsigned long pfn, > + unsigned long pivot_pfn, unsigned long *old_pfn) > +{ > + /* Both old and new pfn either wrapped or not, and new is higher */ > + if (((*old_pfn >= pivot_pfn) == (pfn >= pivot_pfn)) > + && (pfn > *old_pfn)) > + *old_pfn = pfn; > + /* New pfn has wrapped and the old didn't yet */ > + else if ((*old_pfn >= pivot_pfn) && (pfn < pivot_pfn)) > + *old_pfn = pfn; > +} > + > +static void update_cached_free_pfn(unsigned long pfn, > + unsigned long pivot_pfn, unsigned long *old_pfn) > +{ > + /* Both old and new either pfn wrapped or not, and new is lower */ > + if (((*old_pfn < pivot_pfn) == (pfn < pivot_pfn)) > + && (pfn < *old_pfn)) > + *old_pfn = pfn; > + /* New pfn has wrapped and the old didn't yet */ > + else if ((*old_pfn < pivot_pfn) && (pfn >= pivot_pfn)) > + *old_pfn = pfn; > +} > + > /* > * If no pages were isolated then mark this pageblock to be skipped in the > * future. The information is later cleared by __reset_isolation_suitable(). > @@ -186,6 +215,7 @@ static void update_pageblock_skip(struct compact_control *cc, > bool migrate_scanner) > { > struct zone *zone = cc->zone; > + unsigned long pivot_pfn = cc->pivot_pfn; > unsigned long pfn; > > if (cc->ignore_skip_hint) > @@ -203,14 +233,14 @@ static void update_pageblock_skip(struct compact_control *cc, > > /* Update where async and sync compaction should restart */ > if (migrate_scanner) { > - if (pfn > zone->compact_cached_migrate_pfn[0]) > - zone->compact_cached_migrate_pfn[0] = pfn; > - if (cc->mode != MIGRATE_ASYNC && > - pfn > zone->compact_cached_migrate_pfn[1]) > - zone->compact_cached_migrate_pfn[1] = pfn; > + update_cached_migrate_pfn(pfn, pivot_pfn, > + &zone->compact_cached_migrate_pfn[0]); > + if (cc->mode != MIGRATE_ASYNC) > + update_cached_migrate_pfn(pfn, pivot_pfn, > + &zone->compact_cached_migrate_pfn[1]); > } else { > - if (pfn < zone->compact_cached_free_pfn) > - zone->compact_cached_free_pfn = pfn; > + update_cached_free_pfn(pfn, pivot_pfn, > + &zone->compact_cached_free_pfn); > } > } > #else > @@ -808,14 +838,41 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn, > > #endif /* CONFIG_COMPACTION || CONFIG_CMA */ > #ifdef CONFIG_COMPACTION > +static inline bool migrate_scanner_wrapped(struct compact_control *cc) > +{ > + return cc->migrate_pfn < cc->pivot_pfn; > +} > + > +static inline bool free_scanner_wrapped(struct compact_control *cc) > +{ > + return cc->free_pfn >= cc->pivot_pfn; > +} > + > /* > * Test whether the free scanner has reached the same or lower pageblock than > * the migration scanner, and compaction should thus terminate. > */ > static inline bool compact_scanners_met(struct compact_control *cc) > { > - return (cc->free_pfn >> pageblock_order) > - <= (cc->migrate_pfn >> pageblock_order); > + bool free_below_migrate = (cc->free_pfn >> pageblock_order) > + <= (cc->migrate_pfn >> pageblock_order); > + > + if (migrate_scanner_wrapped(cc) != free_scanner_wrapped(cc)) > + /* > + * Only one of the scanners have wrapped. Terminate if free > + * scanner is in the same or lower pageblock than migration > + * scanner. > + */ > + return free_below_migrate; > + else > + /* > + * If neither scanner has wrapped, then free < start <= > + * migration and we return false by definition. > + * It shouldn't happen that both have wrapped, but even if it > + * does due to e.g. reading mismatched zone cached pfn's, then > + * migration < start <= free, so we return true and terminate. > + */ > + return !free_below_migrate; > } > > /* > @@ -832,7 +889,10 @@ static void isolate_freepages(struct compact_control *cc) > unsigned long low_pfn; /* lowest pfn scanner is able to scan */ > int nr_freepages = cc->nr_freepages; > struct list_head *freelist = &cc->freepages; > + bool wrapping; /* set to true in the first pageblock of the zone */ > + bool wrapped; /* set to true when either scanner has wrapped */ > > +wrap: > /* > * Initialise the free scanner. The starting point is where we last > * successfully isolated from, zone-cached value, or the end of the > @@ -848,14 +908,25 @@ static void isolate_freepages(struct compact_control *cc) > block_start_pfn = cc->free_pfn & ~(pageblock_nr_pages-1); > block_end_pfn = min(block_start_pfn + pageblock_nr_pages, > zone_end_pfn(zone)); > - low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages); > + > + wrapping = false; > + wrapped = free_scanner_wrapped(cc) || migrate_scanner_wrapped(cc); > + if (!wrapped) > + /* > + * If neither scanner wrapped yet, we are limited by zone's > + * beginning. Here we pretend that the zone starts pageblock > + * aligned to make the for-loop condition simpler. > + */ > + low_pfn = zone->zone_start_pfn & ~(pageblock_nr_pages-1); > + else > + low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages); > > /* > * Isolate free pages until enough are available to migrate the > * pages on cc->migratepages. We stop searching if the migrate > * and free page scanners meet or enough free pages are isolated. > */ > - for (; block_start_pfn >= low_pfn; > + for (; !wrapping && block_start_pfn >= low_pfn; > block_end_pfn = block_start_pfn, > block_start_pfn -= pageblock_nr_pages, > isolate_start_pfn = block_start_pfn) { > @@ -870,6 +941,24 @@ static void isolate_freepages(struct compact_control *cc) > && compact_should_abort(cc)) > break; > > + /* > + * When we are limited by zone boundary, this means we have > + * reached its first pageblock. > + */ > + if (!wrapped && block_start_pfn <= zone->zone_start_pfn) { > + /* The zone might start in the middle of the pageblock */ > + block_start_pfn = zone->zone_start_pfn; > + if (isolate_start_pfn <= zone->zone_start_pfn) > + isolate_start_pfn = zone->zone_start_pfn; > + /* > + * For e.g. DMA zone with zone_start_pfn == 1, we will > + * underflow block_start_pfn in the next loop > + * iteration. We have to terminate the loop with other > + * means. > + */ > + wrapping = true; > + } > + > page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn, > zone); > if (!page) > @@ -903,6 +992,12 @@ static void isolate_freepages(struct compact_control *cc) > if (isolate_start_pfn >= block_end_pfn) > isolate_start_pfn = > block_start_pfn - pageblock_nr_pages; > + else if (wrapping) > + /* > + * We have been in the first pageblock of the > + * zone, but have not finished it yet. > + */ > + wrapping = false; > break; > } else { > /* > @@ -913,6 +1008,20 @@ static void isolate_freepages(struct compact_control *cc) > } > } > > + /* Did we reach the beginning of the zone? Wrap to the end. */ > + if (!wrapped && wrapping) { > + isolate_start_pfn = (zone_end_pfn(zone)-1) & > + ~(pageblock_nr_pages-1); > + /* > + * If we haven't isolated anything, we have to continue > + * immediately, otherwise page migration will fail. > + */ > + if (!nr_freepages && !cc->contended) { > + cc->free_pfn = isolate_start_pfn; > + goto wrap; > + } > + } > + > /* split_free_page does not map the pages */ > map_pages(freelist); > > @@ -984,10 +1093,11 @@ typedef enum { > static isolate_migrate_t isolate_migratepages(struct zone *zone, > struct compact_control *cc) > { > - unsigned long low_pfn, end_pfn; > + unsigned long low_pfn, end_pfn, max_pfn; > struct page *page; > const isolate_mode_t isolate_mode = > (cc->mode == MIGRATE_ASYNC ? ISOLATE_ASYNC_MIGRATE : 0); > + bool wrapped = migrate_scanner_wrapped(cc) || free_scanner_wrapped(cc); > > /* > * Start at where we last stopped, or beginning of the zone as > @@ -998,13 +1108,27 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, > /* Only scan within a pageblock boundary */ > end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages); > > + if (!wrapped) { > + /* > + * Neither of the scanners has wrapped yet, we are limited by > + * zone end. Here we pretend it's aligned to pageblock > + * boundary to make the for-loop condition simpler > + */ > + max_pfn = ALIGN(zone_end_pfn(zone), pageblock_nr_pages); > + } else { > + /* If any of the scanners wrapped, we will meet free scanner */ > + max_pfn = cc->free_pfn; > + } > + > /* > * Iterate over whole pageblocks until we find the first suitable. > - * Do not cross the free scanner. > + * Do not cross the free scanner or the end of the zone. > */ > - for (; end_pfn <= cc->free_pfn; > + for (; end_pfn <= max_pfn; > low_pfn = end_pfn, end_pfn += pageblock_nr_pages) { > > + if (!wrapped && end_pfn > zone_end_pfn(zone)) > + end_pfn = zone_end_pfn(zone); > /* > * This can potentially iterate a massively long zone with > * many pageblocks unsuitable, so periodically check if we > @@ -1047,6 +1171,10 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone, > } > > acct_isolated(zone, cc); > + /* Did we reach the end of the zone? Wrap to the beginning */ > + if (!wrapped && low_pfn >= zone_end_pfn(zone)) > + low_pfn = zone->zone_start_pfn; > + > /* Record where migration scanner will be restarted. */ > cc->migrate_pfn = low_pfn; > > @@ -1197,22 +1325,48 @@ static int compact_zone(struct zone *zone, struct compact_control *cc) > } > > /* > - * Setup to move all movable pages to the end of the zone. Used cached > + * Setup the scanner positions according to pivot pfn. Use cached > * information on where the scanners should start but check that it > * is initialised by ensuring the values are within zone boundaries. > */ > - cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync]; > - cc->free_pfn = zone->compact_cached_free_pfn; > - if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) { > - cc->free_pfn = end_pfn & ~(pageblock_nr_pages-1); > - zone->compact_cached_free_pfn = cc->free_pfn; > + cc->pivot_pfn = zone->compact_cached_pivot_pfn; > + if (cc->pivot_pfn < start_pfn || cc->pivot_pfn > end_pfn) { > + cc->pivot_pfn = start_pfn; > + zone->compact_cached_pivot_pfn = cc->pivot_pfn; > + /* When starting position was invalid, reset the rest */ > + reset_cached_positions(zone); > } > + > + cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync]; > if (cc->migrate_pfn < start_pfn || cc->migrate_pfn > end_pfn) { > - cc->migrate_pfn = start_pfn; > + cc->migrate_pfn = cc->pivot_pfn; > zone->compact_cached_migrate_pfn[0] = cc->migrate_pfn; > zone->compact_cached_migrate_pfn[1] = cc->migrate_pfn; > } > > + cc->free_pfn = zone->compact_cached_free_pfn; > + if (cc->free_pfn < start_pfn || cc->free_pfn > end_pfn) > + cc->free_pfn = cc->pivot_pfn; > + > + /* > + * Free scanner should start on the beginning of the pageblock below > + * the cc->pivot_pfn. If that's below the zone boundary, wrap to the > + * last pageblock of the zone. > + */ > + if (cc->free_pfn == cc->pivot_pfn) { > + /* Don't underflow in zones starting with e.g. pfn 1 */ > + if (cc->pivot_pfn < pageblock_nr_pages) { > + cc->free_pfn = (end_pfn-1) & ~(pageblock_nr_pages-1); > + } else { > + cc->free_pfn = (cc->pivot_pfn - pageblock_nr_pages); > + cc->free_pfn &= ~(pageblock_nr_pages-1); > + if (cc->free_pfn < start_pfn) > + cc->free_pfn = (end_pfn-1) & > + ~(pageblock_nr_pages-1); > + } > + zone->compact_cached_free_pfn = cc->free_pfn; > + } > + > trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn); > > migrate_prep_local(); > diff --git a/mm/internal.h b/mm/internal.h > index efad241..cb7b297 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -157,6 +157,7 @@ struct compact_control { > struct list_head migratepages; /* List of pages being migrated */ > unsigned long nr_freepages; /* Number of isolated free pages */ > unsigned long nr_migratepages; /* Number of pages to migrate */ > + unsigned long pivot_pfn; /* Where the scanners initially start */ > unsigned long free_pfn; /* isolate_freepages search base */ > unsigned long migrate_pfn; /* isolate_migratepages search base */ > enum migrate_mode mode; /* Async or sync migration mode */ > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>