On Wed 29-01-14 11:08:46, Greg Thelen wrote: [...] > The series looks useful. We (Google) have been using something similar. > In practice such a low_limit (or memory guarantee), doesn't nest very > well. > > Example: > - parent_memcg: limit 500, low_limit 500, usage 500 > 1 privately charged non-reclaimable page (e.g. mlock, slab) > - child_memcg: limit 500, low_limit 500, usage 499 I am not sure this is a good example. Your setup basically say that no single page should be reclaimed. I can imagine this might be useful in some cases and I would like to allow it but it sounds too extreme (e.g. a load which would start trashing heavily once the reclaim starts and it makes more sense to start it again rather than crowl - think about some mathematical simulation which might diverge). > If a streaming file cache workload (e.g. sha1sum) starts gobbling up > page cache it will lead to an oom kill instead of reclaiming. Does it make any sense to protect all of such memory although it is easily reclaimable? > One could > argue that this is working as intended because child_memcg was promised > 500 but can only get 499. So child_memcg is oom killed rather than > being forced to operate below its promised low limit. > > This has led to various internal workarounds like: > - don't charge any memory to interior tree nodes (e.g. parent_memcg); > only charge memory to cgroup leafs. This gets tricky when dealing > with reparented memory inherited to parent from child during cgroup > deletion. Do those need any protection at all? > - don't set low_limit on non leafs (e.g. do not set low limit on > parent_memcg). This constrains the cgroup layout a bit. Some > customers want to purchase $MEM and setup their workload with a few > child cgroups. A system daemon hands out $MEM by setting low_limit > for top-level containers (e.g. parent_memcg). Thereafter such > customers are able to partition their workload with sub memcg below > child_memcg. Example: > parent_memcg > \ > child_memcg > / \ > server backup I think that the low_limit makes sense where you actually want to protect something from reclaim. And backup sounds like a bad fit for that. > Thereafter customers often want some weak isolation between server and > backup. To avoid undesired oom kills the server/backup isolation is > provided with a softer memory guarantee (e.g. soft_limit). The soft > limit acts like the low_limit until priority becomes desperate. Johannes was already suggesting that the low_limit should allow for a weaker semantic as well. I am not very much inclined to that but I can leave with a knob which would say oom_on_lowlimit (on by default but allowed to be set to 0). We would fallback to the full reclaim if no groups turn out to be reclaimable. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>