On 11/13/06, Bryan Henderson <hbryan@xxxxxxxxxx> wrote:
> >Good point. But wouldn't the page cache suffer regardless? (You can't split >up pages between files, AFAIK.) Yeah, you're right, if we're talking about granularity finer than the page size. But furthermore, as long as we're just talking about techniques to reduce internal fragmentation in the disk allocations, there's no reason either the cache usage or the data transfer traffic has to be affected (the fact that a whole block is allocated doesn't mean you have to read or cache the whole block). But head movement and rotational latency are worth considering. If you
As person throwing in the idea, I feel bit responsible. So here go my results from my primitive script (bear with my bashism) on my plain Debian/unstable with 123k files on 10GB partition with ext3, default 8K block. Script to count small files: -+- #!/bin/bash find / -xdev 2>/dev/null | wc -l find / -xdev -\( $(seq -f '-size %gc -o' 1 63) -false -\) 2>/dev/null | wc -l find / -xdev -\( $(seq -f '-size %gc -o' 64 128) -false -\) 2>/dev/null | wc -l -+- First line to find all files on root fs, second to find all files with sizes 1-63 bytes, third - 64-128. (Param '-xdev' tells find to remain on same fs to exclude proc/sys/tmp and so on) And on my system counts are: -+- 107313 8302 2618 -+- This is 10.1% of all files - are small files under 128 bytes. (7.7% < 63 bytes) [ Results for /etc: 1712, 666, 143 (+ 221 file of size in range 129-512 bytes) - small files are better half of whole /etc. ] [ In fact, the optimization of small blocks is widely used in network equipment: many intelligent devices can use several packet queues to send ingress packets to RAM - sorted by size. One device I programmed driver for allowed to have four queues with recommended sizes: 32, 128, 512, 2048 - the sizes allowing to suck in RAM lots of small/medium packets (normally used for control - ICMP, TCP's ACK/SYN, etc) w/o depleting all buffers (normally used for data traffic). I have posted the link here because I was bit surprised that somebody tries to apply similar idea to file systems. ] Most important outcome of the optimization might be that future FSs wouldn't be afraid to set cluster size higher than it is accepted now: e.g. standard 4/8/16K now - but with small file (+ tail) optimization ramp it to 32/64/128K. -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html