On Sep 24, 2008 16:35 -0400, Theodore Ts'o wrote: > On the other hand, if we take your iop/s and translate them to > milliseconds so we can measure the latency in the case where the > workload is essentialy doing random reads, and then cross correlated > it with my measurements, we get this table: Comparing the incremental benefit of each step: > i/o size iops/s ms latency % degredation % improvement > of random inodes of related inodes I/O > 4k 131 7.634 > 8k 130 7.692 0.77% 11.3% 1.57% 10.5% > 16k 128 7.813 2.34% 21.8% 1.63% 7.8% > 32k 126 7.937 3.97% 29.6% 4.29% 5.9% > 64k 121 8.264 8.26% 35.5% 7.67% 4.5% > 128k 113 8.850 15.93% 40.0% 16.07% 2.4% > 256k 100 10.000 31.00% 42.4% > > Depending on whether you believe that workloads involving random inode > reads are more common compared to related inodes I/O, the sweet spot > is probably somewhere between 32k and 128k. I'm open to opinions > (preferably backed up with more benchmarks of likely workloads) of > whether we should use a default value of inode_readahead_bits of 4 or > 5 (i.e., 64k, my original guess, or 128k, in v2 of the patch). But > yes, making it tunable is definitely going to be necessary, since for > different workloads (i.e squid vs. git repositories) will have very > different requirements. It looks like moving from 64kB to 128kB readahead might be a loss for "unknown" workloads, since that increases latency by 7.67% for the random inode case, but we only get 4.5% improvement in the sequential inode case. Also recall that at large scale "htree" breaks down to random inode lookup so that isn't exactly a fringe case (though readahead may still help if the cache is large enough). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html