On Sep 28, 2008 00:27 -0400, Theodore Ts'o wrote: > ext4: Use readahead when reading an inode from the inode table > > With modern hard drives, reading 64k takes roughly the same time as > reading a 4k block. So request readahead for adjacent inode table > blocks to reduce the time it takes when iterating over directories > (especially when doing this in htree sort order) in a cold cache case. > With this patch, the time it takes to run "git status" on a kernel > tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches" > is reduced by 21%. I'd actually thought that having a tunable in units of "kB" is better than blocks, since userspace shouldn't have to know the filesystem block size to tune readahead for a device. Depending on the block size this tunable can vary by 64x the amount of readahead (1kB vs. 64kB blocks). > @@ -3969,6 +3934,36 @@ static int __ext4_get_inode_loc(struct inode *inode, > > make_io: > /* > + * If we need to do any I/O, try to readahead up to 16 > + * blocks from the inode table. Comment is out of date. > + if (EXT4_SB(sb)->s_inode_readahead_blks) { > + /* Make sure s_inode_readahead_blks is a power of 2 */ > + while (EXT4_SB(sb)->s_inode_readahead_blks & > + (EXT4_SB(sb)->s_inode_readahead_blks-1)) > + EXT4_SB(sb)->s_inode_readahead_blks = > + (EXT4_SB(sb)->s_inode_readahead_blks & > + (EXT4_SB(sb)->s_inode_readahead_blks-1)); Is there a good reason why the readahead blocks is a power of 2? Given that the blocks are likely NOT contiguous for a directory, nor are they aligned to the underlying LUN offsets, I don't think this is a benefit. In any case, any tweaking of s_inode_readahead_blks should probably be done at the time it is set instead of each time an inode is read. > + ext4_error(sb, "ext4_get_inode_loc", s/ext4_get_inode_loc/__func__/? > + case Opt_inode_readahead_blks: > + if (option < 0 || option > 31) > + return 0; > + sbi->s_inode_readahead_blks = option; This would appear to limit the inode_readahead_blks to 31 blocks, yet the default is 32? I suspect this is left over from when it was a shift? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html