Based on the graphs which Eric posted, One interesting thing I think you'll find if you repeat the ext3 experiment with e2fsck -t -t is that pass2 will be about seven times longer than pass1. (Which is backwards from most e2fsck runs, where pass2 is about half pass 1's run time --- although obviously that depends on how many directory blocks you have.) Yes, some kind of reservation windows would help on ext3 --- but the question is whether such a change would be too-specific for this benchmark or not. Most of the time directories don't grow to such a huge size. So if you use a smallish (around 8 blocks, say) for many directories this might lead to more filesystem fragmentation that in the long run would cause the filesystem not to age well; it also wouldn't help much when you have over 11 million files in the directory, and a directory with over 100,000 blocks. I don't think delayed allocation is what's helping here either, because the journal will force the directory blocks to be placed as soon as we commit a transaction. I think what's saving us here is that flex_bg and mballoc is separating the directory blocks from the data blocks, allowng the directory blocks to be closely packed together. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html