Re: [RFC][PATCH 0/13] Per-container dcache management (and a bit more)

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 6 May 2011 11:05:37 +1000

On Tue, May 03, 2011 at 04:14:37PM +0400, Pavel Emelyanov wrote:
> Hi.
> 
> According to the "release early, release often" strategy :) I'm
> glad to propose this scratch implementation of what I was talking
> about at the LSF - the way to limit the dcache grow for both
> containerized and not systems (the set applies to 2.6.38).

dcache growth is rarely the memory consumption problem in systems -
it's inode cache growth that is the issue. Each inodes consumes 4-5x
as much memory as a dentry, and the dentry lifecycle is a subset of
the inode lifecycle.  Limiting the number of dentries will do very
little to relieve memory problems because of this.

Indeed, I actually get a request from embedded folks every so often
to limit the size of the inode cache - they never have troubles with
the size of the dentry cache (and I do ask) - so perhaps you need to
consider this aspect of the problem a bit more.

FWIW, I often see machines during tests where the dentry cache is
empty, yet there are millions of inodes cached on the inode LRU
consuming gigabytes of memory. e.g a snapshot from my 4GB RAM test
VM right now:

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
2180754 2107387  96%    0.21K 121153       18    484612K xfs_ili
2132964 2107389  98%    1.00K 533241        4   2132964K xfs_inode
1625922 944034   58%    0.06K  27558       59    110232K size-64
415320 415301    99%    0.19K  20766       20     83064K dentry

You see 400k active dentries consume 83MB of ram, yet 2.1M active
inodes consuming ~2.6GB of RAM. We've already reclaimed the dentry
cache down quite small, while the inode cache remains the dominant
memory consumer.....

I'm also concerned about the scalability issues - moving back to
global lists and locks for LRU, shrinker and mob management is the
opposite direction we are taking - we want to make the LRUs more
fine-grained and more closely related to the MM structures,
shrinkers confined to per-sb context (no more lifecycle issues,
ever) and operate per-node/-zone rather than globally, etc.  It
seems to me that this containerisation will make much of that work
difficult to acheive effectively because it doesn't take any of this
ongoing scalability work into account.

> The first 5 patches are preparations for this, descriptive (I hope)
> comments are inside them.
> 
> The general idea of this set is -- make the dentries subtrees be
> limited in size and shrink them as they hit the configured limit.

And if the inode cache that does not shrink with it?

> Why subtrees? Because this lets having the [dentry -> group] reference
> without the reference count, letting the [dentry -> parent] one handle
> this.
> 
> Why limited? For containers the answer is simple -- a container
> should not be allowed to consume too much of the host memory. For
> non-containerized systems the answer is -- to protect the kernel
> from the non-privileged attacks on the dcache memory like the 
> "while :; do mkdir x; cd x; done" one and similar.

Which will stop as soon as the path gets too long. And if this is
really a problem on your systems, quotas can prevent this from ever
being an issue....

> What isn't in this patch yet, but should be done after the discussion
> 
> * API. I haven't managed to invent any perfect solution, and would
> really like to have it discussed. In order to be able to play with it 
> the ioctls + proc for listing are proposed.
> 
> * New mounts management. Right now if you mount some new FS to a
> dentry which belongs to some managed set (I named it "mob" in this
> patchset), the new mount is managed with the system settings. This is
> not OK, the new mount should be managed with the settings of the
> mountpoint's mob.
> 
> * Elegant shrink_dcache_memory on global memory shortage. By now the
> code walks the mobs and shinks some equal amount of dentries from them.
> Better shrinking policy can and probably should be implemented.

See above.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html