Re: [PATCH 1/4] Add kswapd descriptor.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 7 Dec 2010 17:24:12 -0800
Ying Han <yinghan@xxxxxxxxxx> wrote:

> On Tue, Dec 7, 2010 at 4:39 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > On Tue, 7 Dec 2010 09:28:01 -0800
> > Ying Han <yinghan@xxxxxxxxxx> wrote:
> >
> >> On Tue, Dec 7, 2010 at 4:33 AM, Mel Gorman <mel@xxxxxxxxx> wrote:
> >
> >> Potentially there will
> >> > also be a very large number of new IO sources. I confess I haven't read the
> >> > thread yet so maybe this has already been thought of but it might make sense
> >> > to have a 1:N relationship between kswapd and memcgroups and cycle between
> >> > containers. The difficulty will be a latency between when kswapd wakes up
> >> > and when a particular container is scanned. The closer the ratio is to 1:1,
> >> > the less the latency will be but the higher the contenion on the LRU lock
> >> > and IO will be.
> >>
> >> No, we weren't talked about the mapping anywhere in the thread. Having
> >> many kswapd threads
> >> at the same time isn't a problem as long as no locking contention (
> >> ext, 1k kswapd threads on
> >> 1k fake numa node system). So breaking the zone->lru_lock should work.
> >>
> >
> > That's me who make zone->lru_lock be shared. And per-memcg lock will makes
> > the maintainance of memcg very bad. That will add many races.
> > Or we need to make memcg's LRU not synchronized with zone's LRU, IOW, we need
> > to have completely independent LRU.
> >
> > I'd like to limit the number of kswapd-for-memcg if zone->lru lock contention
> > is problematic. memcg _can_ work without background reclaim.
> 
> >
> > How about adding per-node kswapd-for-memcg it will reclaim pages by a memcg's
> > request ? as
> >
> > Â Â Â Âmemcg_wake_kswapd(struct mem_cgroup *mem)
> > Â Â Â Â{
> > Â Â Â Â Â Â Â Âdo {
> > Â Â Â Â Â Â Â Â Â Â Â Ânid = select_victim_node(mem);
> > Â Â Â Â Â Â Â Â Â Â Â Â/* ask kswapd to reclaim memcg's memory */
> > Â Â Â Â Â Â Â Â Â Â Â Âret = memcg_kswapd_queue_work(nid, mem); /* may return -EBUSY if very busy*/
> > Â Â Â Â Â Â Â Â} while()
> > Â Â Â Â}
> >
> > This will make lock contention minimum. Anyway, using too much cpu for this
> > unnecessary_but_good_for_performance_function is bad. Throttoling is required.
> 
> I don't see the problem of one-kswapd-per-cgroup here since there will
> be no performance cost if they are not running.
> 
Yes. But we've got a report from user who uses 2000+ cgroups on his host, one year ago.
(in libcgroup mailing list.)

So, running 2000+ deadly thread will be bad. It's cost.
In theory, the number of memcg can be 65534.

> I haven't measured the lock contention and cputime for each kswapd
> running. Theoretically it would be a problem
> if thousands of cgroups are configured on the the host and all of them
> are under memory pressure.
> 
I think that's a configuration mistake. 

> We can either optimize the locking or make each kswapd smarter (hold
> the lock less time). My current plan is to have the
> one-kswapd-per-cgroup on the V2 patch w/ select_victim_node, and the
> optimization for this comes as following patchset.
> 

My point above is holding remove node's lock, touching remote node's page
increases memory reclaim cost very much. Then, I like per-node approach.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]