On Tue, May 03, 2011 at 04:14:37PM +0400, Pavel Emelyanov wrote: > Hi. > > According to the "release early, release often" strategy :) I'm > glad to propose this scratch implementation of what I was talking > about at the LSF - the way to limit the dcache grow for both > containerized and not systems (the set applies to 2.6.38). dcache growth is rarely the memory consumption problem in systems - it's inode cache growth that is the issue. Each inodes consumes 4-5x as much memory as a dentry, and the dentry lifecycle is a subset of the inode lifecycle. Limiting the number of dentries will do very little to relieve memory problems because of this. Indeed, I actually get a request from embedded folks every so often to limit the size of the inode cache - they never have troubles with the size of the dentry cache (and I do ask) - so perhaps you need to consider this aspect of the problem a bit more. FWIW, I often see machines during tests where the dentry cache is empty, yet there are millions of inodes cached on the inode LRU consuming gigabytes of memory. e.g a snapshot from my 4GB RAM test VM right now: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 2180754 2107387 96% 0.21K 121153 18 484612K xfs_ili 2132964 2107389 98% 1.00K 533241 4 2132964K xfs_inode 1625922 944034 58% 0.06K 27558 59 110232K size-64 415320 415301 99% 0.19K 20766 20 83064K dentry You see 400k active dentries consume 83MB of ram, yet 2.1M active inodes consuming ~2.6GB of RAM. We've already reclaimed the dentry cache down quite small, while the inode cache remains the dominant memory consumer..... I'm also concerned about the scalability issues - moving back to global lists and locks for LRU, shrinker and mob management is the opposite direction we are taking - we want to make the LRUs more fine-grained and more closely related to the MM structures, shrinkers confined to per-sb context (no more lifecycle issues, ever) and operate per-node/-zone rather than globally, etc. It seems to me that this containerisation will make much of that work difficult to acheive effectively because it doesn't take any of this ongoing scalability work into account. > The first 5 patches are preparations for this, descriptive (I hope) > comments are inside them. > > The general idea of this set is -- make the dentries subtrees be > limited in size and shrink them as they hit the configured limit. And if the inode cache that does not shrink with it? > Why subtrees? Because this lets having the [dentry -> group] reference > without the reference count, letting the [dentry -> parent] one handle > this. > > Why limited? For containers the answer is simple -- a container > should not be allowed to consume too much of the host memory. For > non-containerized systems the answer is -- to protect the kernel > from the non-privileged attacks on the dcache memory like the > "while :; do mkdir x; cd x; done" one and similar. Which will stop as soon as the path gets too long. And if this is really a problem on your systems, quotas can prevent this from ever being an issue.... > What isn't in this patch yet, but should be done after the discussion > > * API. I haven't managed to invent any perfect solution, and would > really like to have it discussed. In order to be able to play with it > the ioctls + proc for listing are proposed. > > * New mounts management. Right now if you mount some new FS to a > dentry which belongs to some managed set (I named it "mob" in this > patchset), the new mount is managed with the system settings. This is > not OK, the new mount should be managed with the settings of the > mountpoint's mob. > > * Elegant shrink_dcache_memory on global memory shortage. By now the > code walks the mobs and shinks some equal amount of dentries from them. > Better shrinking policy can and probably should be implemented. See above. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html