Re: [RFC][PATCH] memcg: remove PCG_ACCT_LRU.

Hugh Dickins <hughd@xxxxxxxxxx> · Tue, 6 Dec 2011 22:30:37 -0800 (PST)

On Wed, 7 Dec 2011, KAMEZAWA Hiroyuki wrote:
> On Tue, 6 Dec 2011 15:50:33 -0800 (PST)
> Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> > On Tue, 6 Dec 2011, KAMEZAWA Hiroyuki wrote:
> > > 
> > > 	1. lock lru
> > > 	2. remove-page-from-lru
> > > 	3. overwrite pc->mem_cgroup
> > > 	4. add page to lru again
> > > 	5. unlock lru
> > 
> > That is indeed the sequence which __mem_cgroup_commit_charge() follows
> > after the patch.
> > 
> > But it optimizes out the majority of cases when no such lru operations
> > are needed (optimizations best presented in a separate patch), while
> > being careful about the tricky case when the page is on lru_add_pvecs,
> > and may get on to an lru at any moment.
> > 
> > And since it uses a separate lock for each memcg-zone's set of lrus,
> > must take care that both lock and lru in 4 and 5 are different from
> > those in 1 and 2.
> > 
> 
> yes, after per-zone-per-memcg lock, Above sequence should take some care.
> 
> With naive solution,
> 
> 	1. get lruvec-1 from target pc->mem_cgroup
> 	2. get lruvec-2 from target memcg to be charged.
> 	3. lock lruvec-x lock
> 	4. lock lruvec-y lock   (x and y order is determined by css_id ?)
> 	5. remove from LRU.
> 	6. overwrite pc->mem_cgroup
> 	7. add page to lru again
> 	8. unlock lruvec-y
> 	9. unlokc lruvec-x
> 
> Hm, maybe there are another clever way..

Our commit_charge does lock page_cgroup, lock old lru_lock, remove from
old lru, update pc->mem_cgroup, unlock old lru_lock, lock new lru_lock,
add to new lru, unlock page_cgroup.  That's complemented by the way
lock_page_lru_irqsave locks lru_lock and then checks if the lru_lock
it got still matches pc->mem_cgroup, retrying if not.

> > > 
> > > isn't it ? I posted a series of patch. I'm glad if you give me a
> > > quick review.
> > 
> > I haven't glanced yet, will do so after an hour or two.
> > 
> 
> I think Johannes's chages of removing page_cgroup->lru allows us
> various chances of optimization/simplification.

Yes, I like Johannes's changes very much, they do indeed open the
way to a lot of simplification and unification.

I have now taken a quickish look at your patches, and tried running
with them.  They look plausible and elegant.  In some places they do
the same as we have done, in others somewhat the opposite.

You tend to rely on knowing when file, anon, shmem and swap pages
are charged, making simplifications based upon SwapCache or not;
whereas I was more ignorant and more general.  Each approach has
its own merit.

Your lrucare nests page_cgroup lock inside lru_lock, and handles the
page on pagevec case very easily that way; whereas we nest lru_lock
inside page_cgroup lock.  I think your way is fine for now, but that
we shall have to reverse it for per-memcg-zone lru locking.

I am so used to thinking in terms of per-memcg-zone lru locking, that
it's hard for me to remember the easier constraints in your case.
We have to treat pc->mem_cgroup more carefully than you do, because
of it telling where the lock is.

I'm not sure whether you're safe to be leaving stale pc->mem_cgroup
behind, potentially after that memcg has been deleted.  We would not
be safe that way (particularly when lumpy reclaim and compaction
come into play), but perhaps you're okay if you've caught everywhere
that needs mem_cgroup_reset_owner.  Or perhaps not.

I did get one crash when shutting down, stack somewhere in khugepaged:
I didn't take much notice because I thought it would easily happen
again, but actually not the next time.  I expect that would have been
from a stale or null pc->mem_cgroup.

It was amusing to see you inserting "mem_cgroup_reset_owner" calls in
read_swap_cache_async and ksm_does_need_to_copy: yes, that's exactly
where we put some of our "mem_cgroup_reset_page" calls, though last
weekend I reworked the patch to avoid the need for them.

I'll mull over your approach, and try it on other machines overnight.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>