On Tue, 18 Jan 2011 13:10:39 -0800 Ying Han <yinghan@xxxxxxxxxx> wrote: > On Tue, Jan 18, 2011 at 12:36 PM, David Rientjes <rientjes@xxxxxxxxxx> wrote: > > On Tue, 18 Jan 2011, Ying Han wrote: > > > >> I agree that "min_free_kbytes" concept doesn't apply well since there > >> is no notion of "reserved pool" in memcg. I borrowed it at the > >> beginning is to add a tunable to the per-memcg watermarks besides the > >> hard_limit. > > > > You may want to add a small amount of memory that a memcg may allocate > > from in oom conditions, however: memory reserves are allocated per-zone > > and if the entire system is oom and that includes several dozen memcgs, > > for example, they could all be contending for the same memory reserves. > > It would be much easier to deplete all reserves since you would have > > several tasks allowed to allocate from this pool: that's not possible > > without memcg since the oom killer is serialized on zones and does not > > kill a task if another oom killed task is already detected in the > > tasklist. > > so something like per-memcg min_wmark which also needs to be reserved upfront? > I think the variable name 'min_free_kbytes' is the source of confusion... It's just a watermark to trigger background reclaim. It's not reservation. > > I think it would be very trivial to DoS the entire machine in this way: > > set up a thousand memcgs with tasks that have core_state, for example, and > > trigger them to all allocate anonymous memory up to their hard limit so > > they oom at the same time. ÂThe machine should livelock with all zones > > having 0 pages free. > > > >> I read the > >> patch posted from Satoru Moriya "Tunable watermarks", and introducing > >> the per-memcg-per-watermark tunable > >> sounds good to me. Might consider adding it to the next post. > >> > > > > Those tunable watermarks were nacked for a reason: they are internal to > > the VM and should be set to sane values by the kernel with no intevention > > needed by userspace. ÂYou'd need to show why a memcg would need a user to > > tune its watermarks to trigger background reclaim and why that's not > > possible by the kernel and how this is a special case in comparsion to the > > per-zone watermarks used by the VM. > > KAMEZAWA gave an example on his early post, which some enterprise user > like to keep fixed amount of free pages > regardless of the hard_limit. > > Since setting the wmarks has impact on the reclaim behavior of each > memcg, adding this flexibility helps the system where it like to > treat memcg differently based on the priority. > Please add some tricks to throttle the usage of cpu by kswapd-for-memcg even when the user sets some bad value. And the total number of threads/workers for all memcg should be throttled, too. (I think this parameter can be sysctl or root cgroup parameter.) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>