Re: [PATCH 08/27] mm, vmscan: Simplify the logic deciding whether kswapd sleeps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 15, 2016 at 05:18:00PM +0200, Vlastimil Babka wrote:
> >@@ -1209,9 +1209,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start)
> >
> > 		arch_refresh_nodedata(nid, pgdat);
> > 	} else {
> >-		/* Reset the nr_zones and classzone_idx to 0 before reuse */
> >+		/* Reset the nr_zones, order and classzone_idx before reuse */
> > 		pgdat->nr_zones = 0;
> >-		pgdat->classzone_idx = 0;
> >+		pgdat->kswapd_order = 0;
> >+		pgdat->kswapd_classzone_idx = -1;
> > 	}
> >
> > 	/* we can use NODE_DATA(nid) from here */
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 4ce578b969da..d8cb483d5cad 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -6036,7 +6036,7 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
> > 	unsigned long end_pfn = 0;
> >
> > 	/* pg_data_t should be reset to zero when it's allocated */
> >-	WARN_ON(pgdat->nr_zones || pgdat->classzone_idx);
> >+	WARN_ON(pgdat->nr_zones || pgdat->kswapd_classzone_idx);
> 
> Above you changed the reset value of kswapd_classzone_idx from 0 to -1, so
> won't this trigger? Also should we check kswapd_order that's newly reset
> too?
> 

Good spot. The memory initialisation paths are ok but node memory hotplug
was broken.

> >
> > 	reset_deferred_meminit(pgdat);
> > 	pgdat->node_id = nid;
> >diff --git a/mm/vmscan.c b/mm/vmscan.c
> >index 96bf841f9352..14b34eebedff 100644
> >--- a/mm/vmscan.c
> >+++ b/mm/vmscan.c
> >@@ -2727,7 +2727,7 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
> >
> > 	/* kswapd must be awake if processes are being throttled */
> > 	if (!wmark_ok && waitqueue_active(&pgdat->kswapd_wait)) {
> >-		pgdat->classzone_idx = min(pgdat->classzone_idx,
> >+		pgdat->kswapd_classzone_idx = min(pgdat->kswapd_classzone_idx,
> > 						(enum zone_type)ZONE_NORMAL);
> > 		wake_up_interruptible(&pgdat->kswapd_wait);
> > 	}
> >@@ -3211,6 +3211,12 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int order,
> >
> > 	prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
> >
> >+	/* If kswapd has not been woken recently, then full sleep */
> >+	if (classzone_idx == -1) {
> >+		classzone_idx = balanced_classzone_idx = MAX_NR_ZONES - 1;
> >+		goto full_sleep;
> 
> This will skip the wakeup_kcompactd() part.
> 

I wrestled with this one. I decided to leave it alone on the grounds
that if kswapd has not been woken recently then compaction efforts also
have not failed and kcompactd is not required.

> >@@ -3311,38 +3316,25 @@ static int kswapd(void *p)
> > 	tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > 	set_freezable();
> >
> >-	order = new_order = 0;
> >-	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
> >-	balanced_classzone_idx = classzone_idx;
> >+	pgdat->kswapd_order = order = 0;
> >+	pgdat->kswapd_classzone_idx = classzone_idx = -1;
> > 	for ( ; ; ) {
> > 		bool ret;
> >
> >+kswapd_try_sleep:
> >+		kswapd_try_to_sleep(pgdat, order, classzone_idx, classzone_idx);
> 
> The last two parameters are now the same, remove one?
> 

Yes. A few more basic simplifications are then possible.

> >@@ -3352,12 +3344,19 @@ static int kswapd(void *p)
> > 		 * We can speed up thawing tasks if we don't call balance_pgdat
> > 		 * after returning from the refrigerator
> > 		 */
> >-		if (!ret) {
> >-			trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
> >+		if (ret)
> >+			continue;
> >
> >-			/* return value ignored until next patch */
> >-			balance_pgdat(pgdat, order, classzone_idx);
> >-		}
> >+		/*
> >+		 * Try reclaim the requested order but if that fails
> >+		 * then try sleeping on the basis of the order reclaimed.
> 
> Is the last word really meant to be "reclaimed", or "requested"?
> 

No, I really meant reclaimed.

> >+		 */
> >+		trace_mm_vmscan_kswapd_wake(pgdat->node_id, order);
> >+		if (balance_pgdat(pgdat, order, classzone_idx) < order)
> >+			goto kswapd_try_sleep;
> 
> AFAICS now kswapd_try_to_sleep() will use the "requested" order. That's
> needed for proper wakeup_kcompactd(), but won't it prevent kswapd from
> actually going to sleep, because zone_balanced() in prepare-sleep will be
> false? So I think you need to give it both orders to do the right thing?
> 

You're right. There is a risk that kswapd stays awake longer in high
fragmentation scenarios.

Should be fixed now by passing in both orders.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]