On Mon, Aug 29, 2011 at 12:22:02AM -0700, Ying Han wrote: > On Mon, Aug 29, 2011 at 12:15 AM, Ying Han <yinghan@xxxxxxxxxx> wrote: > > On Thu, Aug 11, 2011 at 2:09 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > >> > >> On Thu, Aug 11, 2011 at 01:39:45PM -0700, Ying Han wrote: > >> > Please consider including the following patch for the next post. It causes > >> > crash on some of the tests where sc->mem_cgroup is NULL (global kswapd). > >> > > >> > diff --git a/mm/vmscan.c b/mm/vmscan.c > >> > index b72a844..12ab25d 100644 > >> > --- a/mm/vmscan.c > >> > +++ b/mm/vmscan.c > >> > @@ -2768,7 +2768,8 @@ loop_again: > >> > * Do some background aging of the anon list, to > >> > give > >> > * pages a chance to be referenced before > >> > reclaiming. > >> > */ > >> > - if (inactive_anon_is_low(zone, &sc)) > >> > + if (scanning_global_lru(&sc) && > >> > + inactive_anon_is_low(zone, &sc)) > >> > shrink_active_list(SWAP_CLUSTER_MAX, zone, > >> > &sc, priority, 0); > >> > >> Thanks! I completely overlooked this one and only noticed it after > >> changing the arguments to shrink_active_list(). > >> > >> On memcg configurations, scanning_global_lru() will essentially never > >> be true again, so I moved the anon pre-aging to a separate function > >> that also does a hierarchy loop to preage the per-memcg anon lists. > >> > >> I hope to send out the next revision soon. > > > > Also, please consider to fold in the following patch as well. It fixes > > the root cgroup lru accounting and we could easily trigger OOM while > > doing some swapoff test w/o it. > > > > mm:fix the lru accounting for root cgroup. > > > > This patch is applied on top of: > > " > > mm: memcg-aware global reclaim > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > " > > > > This patch fixes the lru accounting for root cgroup. > > > > After the "memcg-aware global reclaim" patch, one of the changes is to have > > lru pages linked back to root. Under the global memory pressure, we start from > > the root cgroup lru and walk through the memcg hierarchy of the system. For > > each memcg, we reclaim pages based on the its lru size. > > > > However for root cgroup, we used not having a seperate lru and only counting > > the pages charged to root as part of root lru size. Without this patch, all > > the pages which are linked to root lru but not charged to root like swapcache > > readahead are not visible to page reclaim code and we are easily to get OOM. > > > > After this patch, all the pages linked under root lru are counted in the lru > > size, including Used and !Used. > > > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > > Signed-off-by: Ying Han <yinghan@xxxxxxxxxx> > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 5518f54..f6c5f29 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -888,19 +888,21 @@ void mem_cgroup_del_lru_list(struct page *page, > > enum lru_list lru) > > { > > >------struct page_cgroup *pc; > > >------struct mem_cgroup_per_zone *mz; > > +>------struct mem_cgroup *mem; > > · > > >------if (mem_cgroup_disabled()) > > >------>-------return; > > >------pc = lookup_page_cgroup(page); > > ->------/* can happen while we handle swapcache. */ > > ->------if (!TestClearPageCgroupAcctLRU(pc)) > > ->------>-------return; > > ->------VM_BUG_ON(!pc->mem_cgroup); > > ->------/* > > ->------ * We don't check PCG_USED bit. It's cleared when the "page" is finally > > ->------ * removed from global LRU. > > ->------ */ > > ->------mz = page_cgroup_zoneinfo(pc->mem_cgroup, page); > > + > > +>------if (TestClearPageCgroupAcctLRU(pc) || PageCgroupUsed(pc)) { This PageCgroupUsed part confuses me. A page that is being isolated shortly after being charged while on the LRU may reach here, and then it is unaccounted from pc->mem_cgroup, which it never was accounted to. Could you explain why you added it? I now made it so that PageCgroupAcctLRU on the LRU means accounted to pc->mem_cgroup, and !PageCgroupAcctLRU on the LRU means accounted to and babysitted by root_mem_cgroup. Always. Which also means that before_commit now ensures an LRU page is moved to root_mem_cgroup for babysitting during the charge, so that concurrent isolations/putbacks are always accounted correctly. Is this what you had in mind? Did I miss something? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>