On Wed, Aug 6, 2014 at 2:26 PM, Arlie Stephens <arlie@xxxxxxxxxxxx> wrote: > On Aug 06 2014, Theodore Ts'o wrote: >> >> I don't subscribe to kernelnewbies, but I came across this thread in >> the mail archive while researching an unrelated issue. >> >> Valdis' observations are on the mark here. It's almost certain that >> you are getting overwhelmed with other disk traffic, because your >> directory isn't *that* big. > > Thank you very much. As the user in question, I'm afraid this one > turns out to be a clear case of "user is an idiot." > > I made a dumb mistake in the way I was measuring things. The situation > on this server is not as bad as it looked. > >> That being said, there are certainly issues with really really big >> directories, and solving this is certainly not going to be a newbie >> project (if it was easy to solve, it would have been addressed a long >> time ago). See: >> >> http://en.it-usenet.org/thread/11916/10367/ > > However, this response is precious. Suddenly a whole bunch of things > make sense from that posting alone. Last time I looked seriously at > file system code, it was the Berkeley Fast File System, also known as > UFS. I've never had time and inclination to look at a modern file > system. That article managed to straighten out multiple misconceptions > for me, and point me in good directions. > >> for the background. It's a little bit dated, in that we do use a >> 64-bit hash on 64-bit systems, but the fundamental issues are still >> there. > > And that's in addition to what you covered here - which includes what > might be a useful workaround for the application which may or may not > be hitting a problem that the ls test was intended to simplify. I'm > passing that on to the app. developer. > > Many, many thanks. > >> If you sort the readdir files by inode order, this can help >> significantly. Some userspace programs, such as mutt, do this. >> Unfortunately "ls" does not. (That might be a good newbie project, >> since it's a userspace-only project. However, I'm pretty sure the >> shellutils maintainers will also react negatively if they are sent >> patches which don't compile. :-) >> >> A proof of concept of how this can be a win can be found here: >> >> http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/tree/contrib/spd_readdir.c >> >> LD_PRELOAD aren't guaranteed to work on all programs, so this is much >> more of a hack than something I'd recommend for extended production >> use. But it shows that if you have a readdir+stat workload, sorting >> by inode makes a huge difference. >> >> As far as getting traces to better understand problems, I strongly >> suggest that you try things like vmstat, iostat, and blktrace; system >> call traces like strace aren't going to get you very far. (See >> http://brooker.co.za/blog/2013/07/14/io-performance.html for a nice >> introduction to blktrace). Use the scientific method; collect >> baseline statistics using vmstat, iostat, sar, before you run your >> test workload, so you know how much I/O is going on before you start >> your test. If you can run your test on a quiscient system, that's a >> really good idea. Then collect statistics as your run your workload, >> and then only tweak one variable at a time, and record everything in a >> systematic way. > > Another tool I didn't know about. Thank you very much. >> >> Finally, if you have more problems of a technical nature with respect >> to the ext4, there is the ext3-users@xxxxxxxxxx list, or the >> developer's list at linux-ext4@xxxxxxxxxxxxxxx. It would be nice if >> you tried the ext3-users or the kernel-newbies or tried googling to >> see if anyone else has come across the problem and figured out the >> solution already, but if you can't figure things out any other way, do >> feel free to ask the linux-ext4 list. We won't bite. :-) > > Thank you. I'll make sure to do my homework properly in future - and > never never believe things senior members of my team tell me without > verifying them first, at least not if I'm going to post about them :-( > >> >> Cheers, >> >> - Ted >> >> P.S. If you have a large number of directories which are much larger >> than you expect, and you don't want to do the "mkdir foo.new; mv foo/* >> foo.new ; rmdir foo; mv foo.new foo" trick on a large number of >> directories, you can also schedule downtime and while the file system >> is unmounted, use "e2fsck -fD". See the man page for more details. >> It won't solve all of your problems, and it might not solve any of >> your problem, but it will probably make the performance of large >> directories somewhat better. > > Another hint of substantially more value than everything I posted > about this topic. > > Thank you again. > > -- > Arlie > > (Arlie Stephens arlie@xxxxxxxxxxxx) > > _______________________________________________ > Kernelnewbies mailing list > Kernelnewbies@xxxxxxxxxxxxxxxxx > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies Thanks Ted, For clearing this up for me seems the issue was not in ext4, and would you mind ccing me in this conversation as a learning read. Regards and Thanks, Nick _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies