On Wed, 2 Dec 2009, Peter Jones wrote:
(on my on tangent...)
On 12/02/2009 12:48 PM, Jesse Keating wrote:
I hypothesize that we could place all rpms for a given release
in a single directory (seth will hate this as he wants to split them up
based on first letter of their name for better filesystem performance),
Ugh, first letter isn't really a great plan anyway. First (few) letters
of a hash of the filename is much better, but obviously hurts browsability.
Next best is probably something like how a dead-tree dictionary index works;
list everything, split the list up by starting letters evenly, so the
directories (given a really unlikely hypothetical package set) are
0/ # contains packages named 0 through 3*
4/ # 4 through 9*
a/ # a through ay*
az/ # az through bw*
bx/ # bx through cz*
da/ # da through whatever's next
...
so that every directory has about the same number of things.
If you're looking for perfect division, sure - but the reality is this:
19K items in a single dir and ext3 and nfs and many many other things crap
themselves returning that list.
If you make 36 subdirs (26+10) performance gets DRAMATICALLY better for
producing the same list of files.
I tested it on our backend to be sure. getting the complete pkglist goes
from taking 5 minutes to take 30s.
yes, I said 5 minutes.
-sv
--
fedora-devel-list mailing list
fedora-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-devel-list