> On Jul 16, 2011, at 9:02 PM, Bernd Schubert wrote: > > > I don't understand it either yet why we have so many, but each directory > > has about 20 to 30 index blocks OK, I think I know what's goign on. Those are 20-30 index blocks; those are 20-30 leaf blocks. Your directories are approximately 80-120k, each, right? So what your patch is doing is constantly doing readahead to bring the *entire* directory into the buffer cache any time you do a dx_probe. That's definitely not what we would want to enable by default, but I really don't like the idea of adding Yet Another Mount option. It expands our testing effort, and the reality is very few people will take advantage of the mount option. How about this? What if we don't actually perform readahead, but instead try to look up all of the blocks to see if they are in the buffer cache using sb_find_get_block(). If it is in the the buffer cache, it will get touched, so it will be less likely to be evicted from the page cache. So for a workload like yours, it should do what you want. But if won't cause all of the pages to get pulled in after the first reference of the directory in question. I'm still worried about the case of a very large directory (say an unreaped tmp directory that has grown to be tens of megabytes). If a program does a sequential scan through the directory doing a "readdir+stat" (i.e., for example a tmp cleaner or someone running the command ls -sF"), we probably shouldn't be trying to keep all of those directory blocks in memory. So if a sequential scan is detected, that should probably suppress the calls to sb_find_get_block(0. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html