On Tue, 7 Dec 2010 18:10:11 -0800 Ying Han <yinghan@xxxxxxxxxx> wrote: > > > >> I haven't measured the lock contention and cputime for each kswapd > >> running. Theoretically it would be a problem > >> if thousands of cgroups are configured on the the host and all of them > >> are under memory pressure. > >> > > I think that's a configuration mistake. > > > >> We can either optimize the locking or make each kswapd smarter (hold > >> the lock less time). My current plan is to have the > >> one-kswapd-per-cgroup on the V2 patch w/ select_victim_node, and the > >> optimization for this comes as following patchset. > >> > > > > My point above is holding remove node's lock, touching remote node's page > > increases memory reclaim cost very much. Then, I like per-node approach. > > So in a case of one physical node and thousands of cgroups, we are > queuing all the works into single kswapd > which is doing the global background reclaim as well. This could be a > problem on a multi-core system where > all the cgroups queuing behind the current work being throttle which > might not be necessary. percpu thread is enough. And there is direct reclaim, absense of kswapd will not be critical (because memcg doesn't need 'zone balancing'). And as you said, 'usual' users will not use 100+ cgroups. Queueing will not be fatal, I think. > > I am not sure which way is better at this point. I would like to keep > the current implementation for the next post V2 > since smaller changes between versions sounds better to me. > yes, please go ahread. I'm not against the functionality itself. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>