On Tue, Jan 18, 2011 at 12:36 PM, David Rientjes <rientjes@xxxxxxxxxx> wrote: > On Tue, 18 Jan 2011, Ying Han wrote: > >> I agree that "min_free_kbytes" concept doesn't apply well since there >> is no notion of "reserved pool" in memcg. I borrowed it at the >> beginning is to add a tunable to the per-memcg watermarks besides the >> hard_limit. > > You may want to add a small amount of memory that a memcg may allocate > from in oom conditions, however: memory reserves are allocated per-zone > and if the entire system is oom and that includes several dozen memcgs, > for example, they could all be contending for the same memory reserves. > It would be much easier to deplete all reserves since you would have > several tasks allowed to allocate from this pool: that's not possible > without memcg since the oom killer is serialized on zones and does not > kill a task if another oom killed task is already detected in the > tasklist. so something like per-memcg min_wmark which also needs to be reserved upfront? > I think it would be very trivial to DoS the entire machine in this way: > set up a thousand memcgs with tasks that have core_state, for example, and > trigger them to all allocate anonymous memory up to their hard limit so > they oom at the same time. The machine should livelock with all zones > having 0 pages free. > >> I read the >> patch posted from Satoru Moriya "Tunable watermarks", and introducing >> the per-memcg-per-watermark tunable >> sounds good to me. Might consider adding it to the next post. >> > > Those tunable watermarks were nacked for a reason: they are internal to > the VM and should be set to sane values by the kernel with no intevention > needed by userspace. You'd need to show why a memcg would need a user to > tune its watermarks to trigger background reclaim and why that's not > possible by the kernel and how this is a special case in comparsion to the > per-zone watermarks used by the VM. KAMEZAWA gave an example on his early post, which some enterprise user like to keep fixed amount of free pages regardless of the hard_limit. Since setting the wmarks has impact on the reclaim behavior of each memcg, adding this flexibility helps the system where it like to treat memcg differently based on the priority. --Ying > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href