Re: Huge number of files in a directory

"Giovanni P. Tirloni" <tirloni@xxxxxxxxx> · Thu, 8 Oct 2009 07:24:54 -0300

On Oct 8, 2009, at 1:30 AM, Cameron Simpson wrote:

On 07Oct2009 16:57, Miner, Jonathan W (US SSA) <jonathan.w.miner@xxxxxxxxxxxxxx 
> wrote:
| The issue with 'ls' is that it wants to sort the output. You may  
want to try using "-f", which says "do not sort"

No, sorting is actually pretty cheap.

The issue with ls and large directories is usually the fact that ls
stat()s all the names. Plenty of other things need to stat()  
everything
too; backups of all kinds, for example. A stat() requires the OS to
search the directory to map the stat()ed name to an inode, and  
that's a
linear operation on ext3 if you haven't turned on directory hashing.  
In
consequence, the 'ls' cost goes as the square of the number of  
directory
entries (n names, each asking for an stat() whose cost is O(n), so
O(n^2) for the whole thing).

The usual approach is to make a tree of subdirectories to mitigate the
per-directory cost (keeping the size on n^2 low).

Just out of curiosity I did the following:

 1) Created directories with sets of empty files
 2) Created a script to time /bin/ls and /bin/ls -f
 3) Ran the script 10 times until the numbers stabilized
 4) Disabled dir_index, rebooted and tried again it all again

This was on a CentOS 5.3 with 512MB RAM running on a VMware ESXi  
hypervisor without any other VM running at the same time. I rebooted  
between tests with and without dir_index and waited for the load to  
settle.

Linux vm-centos.gtirloni 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT  
2009 i686 i686 i386 GNU/Linux

WITH DIR_INDEX:

Files / ls / ls -f

1000 files    / 0.00s  / 0.00s
2500 files    / 0.01s  / 0.00s
5000 files    / 0.03s  / 0.00s
10000 files   / 0.07s  / 0.01s
25000 files   / 0.21s  / 0.02s
50000 files   / 0.45s  / 0.05s
100000 files  / 0.99s  / 0.10s
250000 files  / 2.83s  / 0.25s
500000 files  / 6.04s  / 0.50s
1000000 files / 12.82s / 0.99s

WITHOUT DIR_INDEX:

Files / ls / ls -f

1000 files    / 0.00s  / 0.00s
2500 files    / 0.01s  / 0.00s
5000 files    / 0.03s  / 0.00s
10000 files   / 0.06s  / 0.00s
25000 files   / 0.18s  / 0.01s
50000 files   / 0.41s  / 0.03s
100000 files  / 0.88s  / 0.05s
250000 files  / 2.62s  / 0.14s
500000 files  / 5.55s  / 0.28s
1000000 files / 11.77s / 0.56s

I can't explain why it took longer to finish with dir_index.

-Giovanni

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list