Re: [PATCH 6/10] mm/memcg: take care over pc->mem_cgroup

Konstantin Khlebnikov <khlebnikov@xxxxxxxxxx> · Wed, 22 Feb 2012 08:05:49 +0400

Hugh Dickins wrote:
On Tue, 21 Feb 2012, KAMEZAWA Hiroyuki wrote:
On Mon, 20 Feb 2012 15:34:28 -0800 (PST)
Hugh Dickins<hughd@xxxxxxxxxx>  wrote:
	return NULL;

+	lruvec = page_lock_lruvec(page);
  	lock_page_cgroup(pc);

Do we need to take lrulock+irq disable per page in this very very hot path ?

I'm sure we don't want to: I hope you were pleased to find it goes away
(from most cases) a couple of patches later.

I had lruvec lock nested inside page_cgroup lock in the rollup I sent in
December, whereas you went for page_cgroup lock nested inside lruvec lock
in your lrucare patch.

I couldn't find an imperative reason why they should be one way round or
the other, so I tried hard to stick with your ordering, and it did work
(in this 6/10).  But then I couldn't work out how to get rid of the
overheads added in doing it this way round, so swapped them back.

Hmm.... How about adding NR_ISOLATED counter into lruvec ?

Then, we can delay freeing lruvec until all conunters goes down to zero.
as...

	bool we_can_free_lruvec = true;

	lock_lruvec(lruvec->lock);
	for_each_lru_lruvec(lru)
		if (!list_empty(&lruvec->lru[lru]))
			we_can_free_lruvec = false;
	if (lruvec->nr_isolated)
		we_can_free_lruvec = false;
	unlock_lruvec(lruvec)
	if (we_can_free_lruvec)
		kfree(lruvec);

If compaction, lumpy reclaim free a page taken from LRU,
it knows what it does and can decrement lruvec->nr_isolated properly
(it seems zone's NR_ISOLATED is decremented at putback.)

At the moment I'm thinking that what we end up with by 9/10 is
better than adding such a refcount.  But I'm not entirely happy with
mem_cgroup_reset_uncharged_to_root (it adds a further page_cgroup
lookup just after I got rid of some others), and need yet to think
about the race which Konstantin posits, so all options remain open.

This lruvec->nr_isolated seem reasonable, and its managegin not very costly.
In move_account() we anyway need to touch old_lruvec->lru_lock after recharge,
to stabilize PageLRU() before adding page to new_lruvec. (because that race)
In migration/compaction this handled automatically, because they always call putback_lru_page() at the end.
Main problem is shrink_page_list() for lumpy-reclaim, but seems like it never used if memory
compaction is enabled, so it can be slow and ineffective with tons of lru_list relocks.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>