On 06Jun2006 10:05, Esquivel, Vicente <Esquivelv@xxxxxxx> wrote: | Can anyone tell me if they have experienced long waits while trying to | list a directory with a huge amount of files in it? Sure. It is a common issue with very large directories. | One of our servers that is running on RHEL 4, has a directory that | contains over 2 millions files in it and growing. The files are all | small files in size but there are a lot of them due to the application | that runs on this server. I have tried to do an "ls" and "ls -l" | command inside of that directory but it just seems to run for a long | time with not output, I am assuming that if I leave it running long | enough it will eventually list them all. I was just wondering if anyone | has seen this before or have a better way of getting a listing of all | the files inside a directory like that. There are vaqrious causes for delay (aside from sheer size). First, as mentioned, ls sorts its output which requires it to read all the entries before printing anything. Using the -f option skips the sort. Second, on RedHat systems, the default install includes an alias for "ls" that tries to colour files by type. It's very very annoying. It is also expensive. You will understand that "ls -l" will be expensive because ls must lstat() every file in order to get the information for a "long (-l)" listing. You would expect that plain "ls" does not need to do that, and it should not - it only needs the filenames, which come directly from the directory. However, by aliasing "ls" to the colourising mode, "ls" is again forced to lstat() every file (and worse, stat() every symlink!) in order to determine file types and so to determine colours. Try saying: /bin/ls -f or /bin/ls or unalias ls; ls -f or unalias ls; ls and see if the behaviour improves. Third, very large and flat (no subdirectories) directories are quite expensive on many filesystems because doing a stat() or lstat() to look up file details involved reading the directory contents to map the filename into the file inode number. Often, that is a linear read of the directory (some filesystems use more sophisticated internal structure than a simple linear list, but it is still uncommon). In consequence, the stat() every file requires 2000000 reads of the directory, and each such read will on average read about half the content (it should stop when it finds the filename, which may be anywhere in the listing). So cost for "ls -l" is roughly the _square_ of the number of directory entries. It is usually a performance improvement to break large flat directories into subdirectories. You still need to stat() everything in the long run (2000000 items) but the linear cost per directory can be reduced because each individual directory is smaller. Finally, the sheer size of the directory may be exceeding some stupid hardired limit in the Veritas backup utility, although I'd expect the veritas people to know about such a limit if it exists. Cheers, -- Cameron Simpson <cs@xxxxxxxxxx> DoD#743 http://www.cskk.ezoshosting.com/cs/ Dangerous stuff, science. Lots of us not fit for it. - H.C. Bailey, _The Long Dinner_ -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list