On Oct 8, 2009, at 1:30 AM, Cameron Simpson wrote:
On 07Oct2009 16:57, Miner, Jonathan W (US SSA) <jonathan.w.miner@xxxxxxxxxxxxxx
> wrote:
| The issue with 'ls' is that it wants to sort the output. You may
want to try using "-f", which says "do not sort"
No, sorting is actually pretty cheap.
The issue with ls and large directories is usually the fact that ls
stat()s all the names. Plenty of other things need to stat()
everything
too; backups of all kinds, for example. A stat() requires the OS to
search the directory to map the stat()ed name to an inode, and
that's a
linear operation on ext3 if you haven't turned on directory hashing.
In
consequence, the 'ls' cost goes as the square of the number of
directory
entries (n names, each asking for an stat() whose cost is O(n), so
O(n^2) for the whole thing).
The usual approach is to make a tree of subdirectories to mitigate the
per-directory cost (keeping the size on n^2 low).
Just out of curiosity I did the following:
1) Created directories with sets of empty files
2) Created a script to time /bin/ls and /bin/ls -f
3) Ran the script 10 times until the numbers stabilized
4) Disabled dir_index, rebooted and tried again it all again
This was on a CentOS 5.3 with 512MB RAM running on a VMware ESXi
hypervisor without any other VM running at the same time. I rebooted
between tests with and without dir_index and waited for the load to
settle.
Linux vm-centos.gtirloni 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT
2009 i686 i686 i386 GNU/Linux
WITH DIR_INDEX:
Files / ls / ls -f
1000 files / 0.00s / 0.00s
2500 files / 0.01s / 0.00s
5000 files / 0.03s / 0.00s
10000 files / 0.07s / 0.01s
25000 files / 0.21s / 0.02s
50000 files / 0.45s / 0.05s
100000 files / 0.99s / 0.10s
250000 files / 2.83s / 0.25s
500000 files / 6.04s / 0.50s
1000000 files / 12.82s / 0.99s
WITHOUT DIR_INDEX:
Files / ls / ls -f
1000 files / 0.00s / 0.00s
2500 files / 0.01s / 0.00s
5000 files / 0.03s / 0.00s
10000 files / 0.06s / 0.00s
25000 files / 0.18s / 0.01s
50000 files / 0.41s / 0.03s
100000 files / 0.88s / 0.05s
250000 files / 2.62s / 0.14s
500000 files / 5.55s / 0.28s
1000000 files / 11.77s / 0.56s
I can't explain why it took longer to finish with dir_index.
-Giovanni
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list