Re: compaction: trying to understand the code

Mel Gorman <mel@xxxxxxxxx> · Thu, 19 Aug 2010 09:15:54 +0100

On Thu, Aug 19, 2010 at 04:08:31PM +0800, Wu Fengguang wrote:
> On Thu, Aug 19, 2010 at 03:46:02PM +0800, Mel Gorman wrote:
> > On Thu, Aug 19, 2010 at 04:09:38PM +0900, Iram Shahzad wrote:
> > >> The loop should be waiting for the _other_ processes (doing direct
> > >> reclaims) to proceed.  When there are _lots of_ ongoing page
> > >> allocations/reclaims, it makes sense to wait for them to calm down a bit?
> > >
> > > I have noticed that if I run other process, it helps the loop to exit.
> > > So is this (ie hanging until other process helps) intended behaviour?
> > >
> > 
> > No, it's not but I'm not immediately seeing how it would occur either.
> > too_many_isolated() should only be true when there are multiple
> > processes running that are isolating pages be it due to reclaim or
> > compaction. These should be finishing their work after some time so
> > while a process may stall in too_many_isolated(), it should not stay
> > there forever.
> > 
> > The loop around isolate_migratepages() puts back LRU pages it failed to
> > migrate so it's not the case that the compacting process is isolating a
> > large number of pages and then calling too_many_isolated() against itself.
> 
> It seems the compaction process isolates 128MB pages at a time?

It should be one pageblock at a time for source migration and one pageblock
for target pages. Look at the values for low_pfn and end_pfn here;

static unsigned long isolate_migratepages(struct zone *zone,
                                        struct compact_control *cc)
{
        unsigned long low_pfn, end_pfn;
        struct list_head *migratelist = &cc->migratepages;

        /* Do not scan outside zone boundaries */
        low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);

        /* Only scan within a pageblock boundary */
        end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);

....

and the loop around that looks like

        while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
                unsigned long nr_migrate, nr_remaining;

                if (!isolate_migratepages(zone, cc))
                        continue;

                nr_migrate = cc->nr_migratepages;
                migrate_pages(&cc->migratepages, compaction_alloc,
                                                (unsigned long)cc, 0);
                update_nr_listpages(cc);
                nr_remaining = cc->nr_migratepages;

                count_vm_event(COMPACTBLOCKS);
                count_vm_events(COMPACTPAGES, nr_migrate - nr_remaining);
                if (nr_remaining)
                        count_vm_events(COMPACTPAGEFAILED, nr_remaining);

                /* Release LRU pages not migrated */
                if (!list_empty(&cc->migratepages)) {
                        putback_lru_pages(&cc->migratepages);
                        cc->nr_migratepages = 0;
                }

        }

Where is it isolating 128MB?

> That
> sounds risky, too_many_isolated() can easily be true, which will stall
> direct reclaim processes. I'm not seeing how exactly it makes
> compaction itself stall infinitely though.
> 
> > > Also, the other process does help the loop to exit, but again it enters
> > > the loop and the compaction is never finished. That is, the process
> > > looks like hanging. Is this intended behaviour?
> > 
> > Infinite loops are never intended behaviour.
> 
> Yup.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>