On Thu, Apr 21, 2011 at 10:59 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
On Thu, 21 Apr 2011 22:53:19 -0700
add a counter for kswapd-scan and kswapd-reclaim, kswapd-pickup will showYing Han <yinghan@xxxxxxxxxx> wrote:
> On Thu, Apr 21, 2011 at 10:00 PM, KAMEZAWA Hiroyuki <
> kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
>
> > On Thu, 21 Apr 2011 21:49:04 -0700
> > Ying Han <yinghan@xxxxxxxxxx> wrote:
> >
> > > On Thu, Apr 21, 2011 at 9:36 PM, KAMEZAWA Hiroyuki <
> > > kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > >
> > > > On Thu, 21 Apr 2011 21:24:15 -0700
> > > > Ying Han <yinghan@xxxxxxxxxx> wrote:
> > > >
> > > > > This patch creates a thread pool for memcg-kswapd. All memcg which
> > needs
> > > > > background recalim are linked to a list and memcg-kswapd picks up a
> > memcg
> > > > > from the list and run reclaim.
> > > > >
> > > > > The concern of using per-memcg-kswapd thread is the system overhead
> > > > including
> > > > > memory and cputime.
> > > > >
> > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> > > > > Signed-off-by: Ying Han <yinghan@xxxxxxxxxx>
> > > >
> > > > Thank you for merging. This seems ok to me.
> > > >
> > > > Further development may make this better or change thread pools (to
> > some
> > > > other),
> > > > but I think this is enough good.
> > > >
> > >
> > > Thank you for reviewing and Acking. At the same time, I do have wondering
> > on
> > > the thread-pool modeling which I posted on the cover-letter :)
> > >
> > > The per-memcg-per-kswapd model
> > > Pros:
> > > 1. memory overhead per thread, and The memory consumption would be
> > 8k*1000 =
> > > 8M
> > > with 1k cgroup.
> > > 2. we see lots of threads at 'ps -elf'
> > >
> > > Cons:
> > > 1. the implementation is simply and straigh-forward.
> > > 2. we can easily isolate the background reclaim overhead between cgroups.
> > > 3. better latency from memory pressure to actual start reclaiming
> > >
> > > The thread-pool model
> > > Pros:
> > > 1. there is no isolation between memcg background reclaim, since the
> > memcg
> > > threads
> > > are shared.
> > > 2. it is hard for visibility and debugability. I have been experienced a
> > lot
> > > when
> > > some kswapds running creazy and we need a stright-forward way to identify
> > > which
> > > cgroup causing the reclaim.
> > > 3. potential starvation for some memcgs, if one workitem stucks and the
> > rest
> > > of work
> > > won't proceed.
> > >
> > > Cons:
> > > 1. save some memory resource.
> > >
> > > In general, the per-memcg-per-kswapd implmentation looks sane to me at
> > this
> > > point, esepcially the sharing memcg thread model will make debugging
> > issue
> > > very hard later.
> > >
> > > Comments?
> > >
> > Pros <-> Cons ?
> >
> > My idea is adding trace point for memcg-kswapd and seeing what it's now
> > doing.
> > (We don't have too small trace point in memcg...)
> >
> > I don't think its sane to create kthread per memcg because we know there is
> > a user
> > who makes hundreds/thousands of memcg.
> >
> > And, I think that creating threads, which does the same job, more than the
> > number
> > of cpus will cause much more difficult starvation, priority inversion
> > issue.
> > Keeping scheduling knob/chances of jobs in memcg is important. I don't want
> > to
> > give a hint to scheduler because of memcg internal issue.
> >
> > And, even if memcg-kswapd doesn't exist, memcg works (well?).
> > memcg-kswapd just helps making things better but not do any critical jobs.
> > So, it's okay to have this as best-effort service.
> > Of course, better scheduling idea for picking up memcg is welcomed. It's
> > now
> > round-robin.
> >
> > Hmm. The concern I have is the debug-ability. Let's say I am running a
> system and found memcg-3 running crazy. Is there a way to find out which
> memcg it is trying to reclaim pages from? Also, how to count cputime for the
> shared memcg to the memcgs if we wanted to.
>
you information, if necessary it's good to show some latecy stat. I think
we can add enough information by adding stats (or debug by perf tools.)
I'll consider this a a bit more.
Something like "kswapd_pgscan" and "kswapd_steal" per memcg? If we are going to the thread-pool, we definitely need to add more stats to give us enough visibility of per-memcg background reclaim activity. Still, not sure about the cpu-cycles.
--Ying
Thanks,
-Kame