Re: [RFC] Making memcg track ownership per address_space or anon_vma

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, Greg.

On Thu, Jan 29, 2015 at 09:55:53PM -0800, Greg Thelen wrote:
> I find simplification appealing.  But I not sure it will fly, if for no
> other reason than the shared accountings.  I'm ignoring intentional
> sharing, used by carefully crafted apps, and just thinking about
> incidental sharing (e.g. libc).
> 
> Example:
> 
> $ mkdir small
> $ echo 1M > small/memory.limit_in_bytes
> $ (echo $BASHPID > small/cgroup.procs && exec sleep 1h) &
> 
> $ mkdir big
> $ echo 10G > big/memory.limit_in_bytes
> $ (echo $BASHPID > big/cgroup.procs && exec mlockall_database 1h) &
> 
> Assuming big/mlockall_database mlocks all of libc, then it will oom kill
> the small memcg because libc is owned by small due it having touched it
> first.  It'd be hard to figure out what small did wrong to deserve the
> oom kill.

The previous behavior was pretty unpredictable in terms of shared file
ownership too.  I wonder whether the better thing to do here is either
charging cases like this to the common ancestor or splitting the
charge equally among the accessors, which might be doable for ro
files.

> FWIW we've been using memcg writeback where inodes have a memcg
> writeback owner.  Once multiple memcg write to an inode then the inode
> becomes writeback shared which makes it more likely to be written.  Once
> cleaned the inode is then again able to be privately owned:
> https://lkml.org/lkml/2011/8/17/200

The problem is that it introduces deviations between memcg and
writeback / blkcg which will mess up pressure propagation.  Writeback
pressure can't be determined without its associated memcg and neither
can dirty balancing.  We sure can simplify things by trading off
accuracies at places but let's please try to do that throughout the
stack, not in the midpoint, so that we can say "if you do this, it'll
behave this way and you can see it showing up there".  The thing is if
we leave it half-way, in time, some will try to actively exploit
memcg's page granularity and we'll have to deal with writeback
behavior which is difficult to even characterize.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]