Re: Huge number of files in a directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Oct 8, 2009, at 1:30 AM, Cameron Simpson wrote:

On 07Oct2009 16:57, Miner, Jonathan W (US SSA) <jonathan.w.miner@xxxxxxxxxxxxxx > wrote: | The issue with 'ls' is that it wants to sort the output. You may want to try using "-f", which says "do not sort"

No, sorting is actually pretty cheap.

The issue with ls and large directories is usually the fact that ls
stat()s all the names. Plenty of other things need to stat() everything
too; backups of all kinds, for example. A stat() requires the OS to
search the directory to map the stat()ed name to an inode, and that's a linear operation on ext3 if you haven't turned on directory hashing. In consequence, the 'ls' cost goes as the square of the number of directory
entries (n names, each asking for an stat() whose cost is O(n), so
O(n^2) for the whole thing).

The usual approach is to make a tree of subdirectories to mitigate the
per-directory cost (keeping the size on n^2 low).

Just out of curiosity I did the following:

 1) Created directories with sets of empty files
 2) Created a script to time /bin/ls and /bin/ls -f
 3) Ran the script 10 times until the numbers stabilized
 4) Disabled dir_index, rebooted and tried again it all again

This was on a CentOS 5.3 with 512MB RAM running on a VMware ESXi hypervisor without any other VM running at the same time. I rebooted between tests with and without dir_index and waited for the load to settle.

Linux vm-centos.gtirloni 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT 2009 i686 i686 i386 GNU/Linux

WITH DIR_INDEX:

Files / ls / ls -f

1000 files    / 0.00s  / 0.00s
2500 files    / 0.01s  / 0.00s
5000 files    / 0.03s  / 0.00s
10000 files   / 0.07s  / 0.01s
25000 files   / 0.21s  / 0.02s
50000 files   / 0.45s  / 0.05s
100000 files  / 0.99s  / 0.10s
250000 files  / 2.83s  / 0.25s
500000 files  / 6.04s  / 0.50s
1000000 files / 12.82s / 0.99s

WITHOUT DIR_INDEX:

Files / ls / ls -f

1000 files    / 0.00s  / 0.00s
2500 files    / 0.01s  / 0.00s
5000 files    / 0.03s  / 0.00s
10000 files   / 0.06s  / 0.00s
25000 files   / 0.18s  / 0.01s
50000 files   / 0.41s  / 0.03s
100000 files  / 0.88s  / 0.05s
250000 files  / 2.62s  / 0.14s
500000 files  / 5.55s  / 0.28s
1000000 files / 11.77s / 0.56s

I can't explain why it took longer to finish with dir_index.

-Giovanni




--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

[Index of Archives]     [CentOS]     [Kernel Development]     [PAM]     [Fedora Users]     [Red Hat Development]     [Big List of Linux Books]     [Linux Admin]     [Gimp]     [Asterisk PBX]     [Yosemite News]     [Red Hat Crash Utility]


  Powered by Linux