On Mon, 2006-11-13 at 16:57 -0700, Andreas Dilger wrote: > On Nov 14, 2006 00:32 +0100, Ihar `Philips` Filipau wrote: > > As person throwing in the idea, I feel bit responsible. So here go my > > results from my primitive script (bear with my bashism) on my plain > > Debian/unstable with 123k files on 10GB partition with ext3, default > > 8K block. > > > > Script to count small files: > > -+- > > #!/bin/bash > > find / -xdev 2>/dev/null | wc -l > > find / -xdev -\( $(seq -f '-size %gc -o' 1 63) -false -\) 2>/dev/null | wc > > -l > > find / -xdev -\( $(seq -f '-size %gc -o' 64 128) -false -\) 2>/dev/null | > > wc -l > > -+- > > First line to find all files on root fs, second to find all files with > > sizes 1-63 bytes, third - 64-128. (Param '-xdev' tells find to remain > > on same fs to exclude proc/sys/tmp and so on) > > > > And on my system counts are: > > -+- > > 107313 > > 8302 > > 2618 > > -+- > > > > This is 10.1% of all files - are small files under 128 bytes. (7.7% < 63 > > bytes) > > > > [ Results for /etc: 1712, 666, 143 (+ 221 file of size in range > > 129-512 bytes) - small files are better half of whole /etc. ] > > Note that using the root filesystem is a skewed result (esp. on GTK systems > where lots of single-valued files are used by gconf). Many root filesystems > using ext3 are formatted with 1kB blocks for this reason. Also gather stats > for other filesystems. > > At the filesystem summit we DID find a surprising number of small files > even when the whole system was examined. We discussed storing small > files directly in the inode along with other EAs (this would require > larger inodes). This improves data locality and performance (i.e. stat > of the file loads the small file data into cache), though the assumption > is that there will be an increasing number of EAs on files in the future. > It also avoids the issues w.r.t. packing file data from different files > into the same block and they have different lifespans, etc. I would agree that if the focus is on files that are 128 bytes or smaller, storing the data in the inode makes the most sense. I don't think it's worth the complexity to doing any kind of tail merging unless you would expect that a large number of small files would be too big to practically fit in the inode, but small enough that it is worth doing something to store them efficiently. Symbolic links have been stored this way for a long time. -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html