On Aug 06 2014, Theodore Ts'o wrote: > > I don't subscribe to kernelnewbies, but I came across this thread in > the mail archive while researching an unrelated issue. > > Valdis' observations are on the mark here. It's almost certain that > you are getting overwhelmed with other disk traffic, because your > directory isn't *that* big. Thank you very much. As the user in question, I'm afraid this one turns out to be a clear case of "user is an idiot." I made a dumb mistake in the way I was measuring things. The situation on this server is not as bad as it looked. > That being said, there are certainly issues with really really big > directories, and solving this is certainly not going to be a newbie > project (if it was easy to solve, it would have been addressed a long > time ago). See: > > http://en.it-usenet.org/thread/11916/10367/ However, this response is precious. Suddenly a whole bunch of things make sense from that posting alone. Last time I looked seriously at file system code, it was the Berkeley Fast File System, also known as UFS. I've never had time and inclination to look at a modern file system. That article managed to straighten out multiple misconceptions for me, and point me in good directions. > for the background. It's a little bit dated, in that we do use a > 64-bit hash on 64-bit systems, but the fundamental issues are still > there. And that's in addition to what you covered here - which includes what might be a useful workaround for the application which may or may not be hitting a problem that the ls test was intended to simplify. I'm passing that on to the app. developer. Many, many thanks. > If you sort the readdir files by inode order, this can help > significantly. Some userspace programs, such as mutt, do this. > Unfortunately "ls" does not. (That might be a good newbie project, > since it's a userspace-only project. However, I'm pretty sure the > shellutils maintainers will also react negatively if they are sent > patches which don't compile. :-) > > A proof of concept of how this can be a win can be found here: > > http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/contrib/spd_readdir.c > > LD_PRELOAD aren't guaranteed to work on all programs, so this is much > more of a hack than something I'd recommend for extended production > use. But it shows that if you have a readdir+stat workload, sorting > by inode makes a huge difference. > > As far as getting traces to better understand problems, I strongly > suggest that you try things like vmstat, iostat, and blktrace; system > call traces like strace aren't going to get you very far. (See > http://brooker.co.za/blog/2013/07/14/io-performance.html for a nice > introduction to blktrace). Use the scientific method; collect > baseline statistics using vmstat, iostat, sar, before you run your > test workload, so you know how much I/O is going on before you start > your test. If you can run your test on a quiscient system, that's a > really good idea. Then collect statistics as your run your workload, > and then only tweak one variable at a time, and record everything in a > systematic way. Another tool I didn't know about. Thank you very much. > > Finally, if you have more problems of a technical nature with respect > to the ext4, there is the ext3-users@xxxxxxxxxx list, or the > developer's list at linux-ext4@xxxxxxxxxxxxxxx. It would be nice if > you tried the ext3-users or the kernel-newbies or tried googling to > see if anyone else has come across the problem and figured out the > solution already, but if you can't figure things out any other way, do > feel free to ask the linux-ext4 list. We won't bite. :-) Thank you. I'll make sure to do my homework properly in future - and never never believe things senior members of my team tell me without verifying them first, at least not if I'm going to post about them :-( > > Cheers, > > - Ted > > P.S. If you have a large number of directories which are much larger > than you expect, and you don't want to do the "mkdir foo.new; mv foo/* > foo.new ; rmdir foo; mv foo.new foo" trick on a large number of > directories, you can also schedule downtime and while the file system > is unmounted, use "e2fsck -fD". See the man page for more details. > It won't solve all of your problems, and it might not solve any of > your problem, but it will probably make the performance of large > directories somewhat better. Another hint of substantially more value than everything I posted about this topic. Thank you again. -- Arlie (Arlie Stephens arlie@xxxxxxxxxxxx) _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies