On Thu, Apr 13, 2006 at 10:23:25PM -0700, Andrew Morton wrote: > David Chinner <dgc@xxxxxxx> wrote: > > I've already fixed that problem with: > > > > http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1fc5d959d88a5f77aa7e4435f6c9d0e2d2236704 > > > > and the machine showing the shrink_dcache_sb() problems is > > already running that fix. That problem masked the shrink_dcache_sb > > one - who notices a 10s hang when the machine has been really, > > really slow for 20 minutes? > > So the problem is shrink_dcache_sb() and not regular memory reclaim? *nod* > What is happening on that machine to be triggering shrink_dcache_sb()? > automounts? That was what i first suspected as themachine has lots of automounted directories. however, breaking into kdb when the problem occurred showed mounts of sysfs and /dev/pts. >From what I've been told, the chroot package build environment being used (not something we control) mounts /proc, /sys, /dev/pts, etc when the chroot is entered, and unmounts them after the build is complete. hence for every package build an engineer kicks off, we get multiple mounts and unmounts occurring. > We fixed a similar problem in the inode cache a year or so back by creating > per-superblock inode lists, so that > search-a-global-list-for-objects-belonging-to-this-superblock thing went > away. Presumably we could fix this in the same manner. But adding two > more pointers to struct dentry would hurt. *nod* I can't see æny fields we'd be able to overload, either. > An alternative might be to remove the global LRU altogether, make it > per-superblock. That would reduce the quality of the LRUing, but that > probably wouldn't hurt a lot. That makes reclaim an interesting problem. You'd have to walk the superblock list to get to each lru, and then how would you ensure that you fairly reclaim inodes from each filesystem? > Another idea would be to take shrinker_sem for writing when running > shrink_dcache_sb() - that would prevent tasks from coming in and getting > stuck on dcache_lock, but there are plentry of other places which want > dcache_lock. Yeah, the contention we are seeing is from all those other places, so I can't see how taking the shrinker semaphore reall helps us here. > I don't immediately see any simple tweaks which would allow us to avoid that > long lock hold time. Perhaps the scanning in shrink_dcache_sb() could use > just rcu_read_lock()... We're modifying the list as we scan it, so I can't see how we can do this without an exclusive lock. The other thing that I thought of over the weekend is per-node LRU lists and a lock per node This will reduce the length of the lists, allow some parallelism even while we scan and purge each list using the existing algorithm, and not completely destroy the LRU-ness of the dcache. It would also allow parallelising prune_dcache() and allow the shrinker to prune the local node first (i.e. where we really need the memory). FWIW, this is a showstopper for us. The only thing that is allowing us to keep running a recent kernel on this machine is the fact that someone is running `echo 3 > /proc/sys/vm/drop_caches` as soon as the slowdown manifests to blow away the dentry cache.... Cheers, Dave. -- Dave Chinner R&D Software Enginner SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html