Re: [PATCH 1/3] Lazily open pack index files on demand

"Dana How" <danahow@xxxxxxxxx> · Sat, 26 May 2007 21:40:42 -0700

On 5/26/07, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote:
Dana How <danahow@xxxxxxxxx> wrote:
> Shawn:  When I first saw the index-loading code,  my first
> thought was that all the index tables should be
> merged (easy since sorted) so callers only need to do one search.

Yes; in fact this has been raised on the list before.  The general
idea was to create some sort of "super index" that had a list of
all objects and which packfile they could be found in.  This way the
running process doesn't have to search multiple indexes, and the
process doesn't have to be responsible for the merging itself.

See the thing is, if you read all of every .idx file on a simple
`git-log` operation you've already lost.  The number of trees and
blobs tends to far outweigh the number of commits and they really
outweigh the number of commits the average user looks at in a
`git-log` session before they abort their pager.  So sorting all
of the available .idx files before we produce even the first commit
is a horrible thing to do.

But the problem with a super index is repacking.  Every time the user
repacks their recent loose objects (or recently fetched packs) we are
folding some packfiles together, but may be leaving others alone.
The super index would need to account for the packfiles we aren't
looking at or repacking.  It gets complicated fast.

There's also the problem of alternate ODBs; do we fold the indexes
of our alternates into our own super index?  Or does each ODB get
its own super index and we still have to load multiple super index
files?
Yes,  the problem is that even an on-demand, "lazy" merge
is likely to require far more work than the expected number of index probes.

In pack v4 we're likely to move the SHA-1 table from the .idx file
into the front of the .pack file.  This makes the .idx file hold
only the offsets and the CRC checkums of each object.  If we start
making a super index, we have to duplicate the SHA-1 table twice
(once in the .pack, again in the super index).
Hmm, hopefully the SHA-1 table can go at the _end_
since with split packs that's the only time we know the number
of objects in the pack... ;-)

Thanks,
--
Dana L. How  danahow@xxxxxxxxx  +1 650 804 5991 cell
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html