On Thu 14-05-15 21:23:04, Dave Chinner wrote: > On Wed, May 13, 2015 at 08:52:59PM -0700, Linus Torvalds wrote: > > And readdir() itself, for that matter - we have no good vfs-level > > readdir caching, so it all ends up serialized on the inode > > semaphore, and it all goes all the way into the filesystem to get > > the readdir data. And at least for ext4, readdir() > > is slow anyway, because it doesn't use the page cache, it uses > > that good old buffer cache, because of how ext4 does metadata > > journaling etc. > > IIRC, ext4 readdir is not slow because of the use of the buffer > cache, it's slow because of the way it hashes dirents across blocks > on disk. i.e. it has locality issues, not a caching problem. For ext4 readdir is just a linear read of the directory. Linus is right we store directory blocks in buffer cache but we do our own readahead on directory blocks so I don't think much slowness comes from that. One thing that is slowing us down is that we don't do preallocation for directories so they often end up being fragmented a lot. The locality problem you are probably referring to is that readdir on ext4 returns directory entries in hash order. That is different from the ordering by inode number which is optimal for the following cache-cold stat / unlink / whatever you want to do with inodes. This causes big performance issues e.g. if you do rm -rf on large directory hierarchy. But you don't see that often these days as lots of utilities have learned to workaround ext4 problems by sorting directory entries by inode number before doing anything with them. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html