On 5/26/07, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote:
Dana How <danahow@xxxxxxxxx> wrote: > Shawn: When I first saw the index-loading code, my first > thought was that all the index tables should be > merged (easy since sorted) so callers only need to do one search. Yes; in fact this has been raised on the list before. The general idea was to create some sort of "super index" that had a list of all objects and which packfile they could be found in. This way the running process doesn't have to search multiple indexes, and the process doesn't have to be responsible for the merging itself. See the thing is, if you read all of every .idx file on a simple `git-log` operation you've already lost. The number of trees and blobs tends to far outweigh the number of commits and they really outweigh the number of commits the average user looks at in a `git-log` session before they abort their pager. So sorting all of the available .idx files before we produce even the first commit is a horrible thing to do. But the problem with a super index is repacking. Every time the user repacks their recent loose objects (or recently fetched packs) we are folding some packfiles together, but may be leaving others alone. The super index would need to account for the packfiles we aren't looking at or repacking. It gets complicated fast. There's also the problem of alternate ODBs; do we fold the indexes of our alternates into our own super index? Or does each ODB get its own super index and we still have to load multiple super index files?
Yes, the problem is that even an on-demand, "lazy" merge is likely to require far more work than the expected number of index probes.
In pack v4 we're likely to move the SHA-1 table from the .idx file into the front of the .pack file. This makes the .idx file hold only the offsets and the CRC checkums of each object. If we start making a super index, we have to duplicate the SHA-1 table twice (once in the .pack, again in the super index).
Hmm, hopefully the SHA-1 table can go at the _end_ since with split packs that's the only time we know the number of objects in the pack... ;-) Thanks, -- Dana L. How danahow@xxxxxxxxx +1 650 804 5991 cell - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html