On Tue, Sep 3, 2013 at 1:09 PM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > On Sat, 31 Aug 2013, Duy Nguyen wrote: > >> On Tue, Aug 27, 2013 at 11:26 AM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: >> > A bit crud but good enough for now. >> >> I wonder if we should keep a short SHA-1 table in .idx. An entry in >> the original .idx v1 table is <SHA-1>+<offset> then offset moved out >> to make the table more compact for binary search. I observe that we >> don't always need 20 byte SHA-1 to uniquely identify an entry in a >> pack, so the SHA-1 table could be split in two: one table contain the >> first part of SHA-1, long enough to identify any entry in the pack; >> the second table contains either full SHA-1 or the remaining part. >> Binary search is done on the first table, if matched, full sha-1 from >> the second table is compared. We already have the second sha-1 table >> in .pack v4. We could add the first table in .idx v3. >> >> On linux-2.6 even in one-pack configuration, we only need the first 6 >> bytes of sha-1 to identify an object. That would cut the bsearch sha-1 >> table to 1/4 full sha-1 table size. > > I don't see the point though. > > Why having two tables when only one suffice? > > If the argument is about making the SHA1 binary search more efficient > given a smaller memory footprint, that comes with extra complexity > coming from the variable length SHA1 strings in the second table. So > I'm not sure there is much to gain. > > Furthermore, one of the design of pack v4 is to avoid the SHA1 binary > search entirely. You will need the binary search only once to locate > the top commit object, but from there the entire object tree can be > walked simply by using the sha1ref index and looking up the > corresponding offset into the pack index file to locate the next object. > > Same thing for walking tree objects: the pack v4 tree deltas are meant > to be walked inline without having to _apply_ any delta. Hmm.. you are right. We avoid a lot of SHA-1 lookups with .pack v4. A more compact SHA-1 table only benefits random SHA-1 lookups, which should be much fewer compared to "rev-list --objects --all". -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html