On Sat, 31 Aug 2013, Duy Nguyen wrote: > On Tue, Aug 27, 2013 at 11:26 AM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > > A bit crud but good enough for now. > > I wonder if we should keep a short SHA-1 table in .idx. An entry in > the original .idx v1 table is <SHA-1>+<offset> then offset moved out > to make the table more compact for binary search. I observe that we > don't always need 20 byte SHA-1 to uniquely identify an entry in a > pack, so the SHA-1 table could be split in two: one table contain the > first part of SHA-1, long enough to identify any entry in the pack; > the second table contains either full SHA-1 or the remaining part. > Binary search is done on the first table, if matched, full sha-1 from > the second table is compared. We already have the second sha-1 table > in .pack v4. We could add the first table in .idx v3. > > On linux-2.6 even in one-pack configuration, we only need the first 6 > bytes of sha-1 to identify an object. That would cut the bsearch sha-1 > table to 1/4 full sha-1 table size. I don't see the point though. Why having two tables when only one suffice? If the argument is about making the SHA1 binary search more efficient given a smaller memory footprint, that comes with extra complexity coming from the variable length SHA1 strings in the second table. So I'm not sure there is much to gain. Furthermore, one of the design of pack v4 is to avoid the SHA1 binary search entirely. You will need the binary search only once to locate the top commit object, but from there the entire object tree can be walked simply by using the sha1ref index and looking up the corresponding offset into the pack index file to locate the next object. Same thing for walking tree objects: the pack v4 tree deltas are meant to be walked inline without having to _apply_ any delta. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html