Re: [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake

Minchan Kim <minchan.kim@xxxxxxxxx> · Tue, 17 Aug 2010 11:26:05 +0900

On Tue, Aug 17, 2010 at 1:06 AM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> [npiggin@xxxxxxx bounces, switched to yahoo address]
>
> On Mon, Aug 16, 2010 at 10:43:50AM +0100, Mel Gorman wrote:

<snip>

>> +      * potentially causing a live-lock. While kswapd is awake and
>> +      * free pages are low, get a better estimate for free pages
>> +      */
>> +     if (nr_free_pages < zone->percpu_drift_mark &&
>> +                     !waitqueue_active(&zone->zone_pgdat->kswapd_wait)) {
>> +             int cpu;
>> +
>> +             for_each_online_cpu(cpu) {
>> +                     struct per_cpu_pageset *pset;
>> +
>> +                     pset = per_cpu_ptr(zone->pageset, cpu);
>> +                     nr_free_pages += pset->vm_stat_diff[NR_FREE_PAGES];

We need to consider CONFIG_SMP.

>> +             }
>> +     }
>> +
>> +     return nr_free_pages;
>> +}
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index c2407a4..67a2ed0 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1462,7 +1462,7 @@ int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
>>  {
>>       /* free_pages my go negative - that's OK */
>>       long min = mark;
>> -     long free_pages = zone_page_state(z, NR_FREE_PAGES) - (1 << order) + 1;
>> +     long free_pages = zone_nr_free_pages(z) - (1 << order) + 1;
>>       int o;
>>
>>       if (alloc_flags & ALLOC_HIGH)
>> @@ -2413,7 +2413,7 @@ void show_free_areas(void)
>>                       " all_unreclaimable? %s"
>>                       "\n",
>>                       zone->name,
>> -                     K(zone_page_state(zone, NR_FREE_PAGES)),
>> +                     K(zone_nr_free_pages(zone)),
>>                       K(min_wmark_pages(zone)),
>>                       K(low_wmark_pages(zone)),
>>                       K(high_wmark_pages(zone)),
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 7759941..c95a159 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -143,6 +143,9 @@ static void refresh_zone_stat_thresholds(void)
>>               for_each_online_cpu(cpu)
>>                       per_cpu_ptr(zone->pageset, cpu)->stat_threshold
>>                                                       = threshold;
>> +
>> +             zone->percpu_drift_mark = high_wmark_pages(zone) +
>> +                                     num_online_cpus() * threshold;
>>       }
>>  }
>
> Hm, this one I don't quite get (might be the jetlag, though): we have
> _at least_ NR_FREE_PAGES free pages, there may just be more lurking in

We can't make sure it.
As I said previous mail, current allocation path decreases
NR_FREE_PAGES after it removes pages from buddy list.

> the pcp counters.
>
> So shouldn't we only collect the pcp deltas in case the high watermark
> is breached?  Above this point, we should be fine or better, no?

If we don't consider allocation path, I agree on Hannes's opinion.
At least, we need to listen why Mel determine the threshold. :)

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href