Hello, Greg. On Thu, Jan 29, 2015 at 09:55:53PM -0800, Greg Thelen wrote: > I find simplification appealing. But I not sure it will fly, if for no > other reason than the shared accountings. I'm ignoring intentional > sharing, used by carefully crafted apps, and just thinking about > incidental sharing (e.g. libc). > > Example: > > $ mkdir small > $ echo 1M > small/memory.limit_in_bytes > $ (echo $BASHPID > small/cgroup.procs && exec sleep 1h) & > > $ mkdir big > $ echo 10G > big/memory.limit_in_bytes > $ (echo $BASHPID > big/cgroup.procs && exec mlockall_database 1h) & > > Assuming big/mlockall_database mlocks all of libc, then it will oom kill > the small memcg because libc is owned by small due it having touched it > first. It'd be hard to figure out what small did wrong to deserve the > oom kill. The previous behavior was pretty unpredictable in terms of shared file ownership too. I wonder whether the better thing to do here is either charging cases like this to the common ancestor or splitting the charge equally among the accessors, which might be doable for ro files. > FWIW we've been using memcg writeback where inodes have a memcg > writeback owner. Once multiple memcg write to an inode then the inode > becomes writeback shared which makes it more likely to be written. Once > cleaned the inode is then again able to be privately owned: > https://lkml.org/lkml/2011/8/17/200 The problem is that it introduces deviations between memcg and writeback / blkcg which will mess up pressure propagation. Writeback pressure can't be determined without its associated memcg and neither can dirty balancing. We sure can simplify things by trading off accuracies at places but let's please try to do that throughout the stack, not in the midpoint, so that we can say "if you do this, it'll behave this way and you can see it showing up there". The thing is if we leave it half-way, in time, some will try to actively exploit memcg's page granularity and we'll have to deal with writeback behavior which is difficult to even characterize. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>