On Wed, Oct 09, 2013 at 06:02:41AM -0700, Christoph Hellwig wrote: > Currently libxfs has a cache for xfs_inode structures. Unlike in kernelspace > where the inode cache, and the associated page cache for file data is used > for all filesystem operations the libxfs inode cache is only used in few > places: > > - the libxfs init code reads the root and realtime inodes when called from > xfs_db using a special flag, but these inode structure are never referenced > again > - mkfs uses namespace and bmap routines that take the xfs_inode structure > to create the root and realtime inodes, as well as any additional files > specified in the proto file > - the xfs_db attr code uses xfs_inode-based attr routines in the attrset > and attrget commands > - phase6 of xfs_repair uses xfs_inode-based routines for rebuilding > directories and moving files to the lost+found directory. > - phase7 of xfs_repair uses struct xfs_inode to modify the nlink count > of inodes. > > So except in repair we never ever reuse a cached inode, and even in repair > the logical inode caching doesn't help: > > - in phase 6a we iterate over each inode in the incore inode tree, > and if it's a directory check/rebuild it > - phase6b then updates the "." and ".." entries for directories > that need, which means we require the backing buffers. > - phase6c moves disconnected inodes to lost_found, which again needs > the backing buffer to actually do anything. > - phase7 then only touches inodes for which we need to reset i_nlink, > which always involves reading, modifying and writing the physical > inode. > which always involves modifying the . and .. entries. > > Given these facts stop caching the inodes to reduce memory usage > especially in xfs_repair, where this makes a different for large inode > count inodes. On the upper end this allows repair to complete for > filesystem / amount of memory combinations that previously wouldn't. This all sounds good and the code looks fine, but there's one lingering question I have - what's the impact on performance for repair? Does it slow down phase 6/7 at all? > With this we probably could increase the memory available to the buffer > cache in xfs_repair, but trying to do so I got a bit lost - the current > formula seems to magic to me to make any sense, and simply doubling the > buffer cache size causes us to run out of memory given that the data cached > in the buffer cache (typically lots of 8k inode buffers and few 4k other > metadata buffers) are much bigger than the inodes cached in the inode > cache. We probably need a sizing scheme that takes the actual amount > of memory allocated to the buffer cache into account to solve this better. IIRC, the size of the buffer cache is currently set at 75% of RAM so doubling it would cause OOM issues regardless of the presence of the inode cache.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs