Re: [PATCH 23/23] initial pack index v3 support on the read side

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 3, 2013 at 1:09 PM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
> On Sat, 31 Aug 2013, Duy Nguyen wrote:
>
>> On Tue, Aug 27, 2013 at 11:26 AM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
>> > A bit crud but good enough for now.
>>
>> I wonder if we should keep a short SHA-1 table in .idx. An entry in
>> the original .idx v1 table is <SHA-1>+<offset> then offset moved out
>> to make the table more compact for binary search. I observe that we
>> don't always need 20 byte SHA-1 to uniquely identify an entry in a
>> pack, so the SHA-1 table could be split in two: one table contain the
>> first part of SHA-1, long enough to identify any entry in the pack;
>> the second table contains either full SHA-1 or the remaining part.
>> Binary search is done on the first table, if matched, full sha-1 from
>> the second table is compared. We already have the second sha-1 table
>> in .pack v4. We could add the first table in .idx v3.
>>
>> On linux-2.6 even in one-pack configuration, we only need the first 6
>> bytes of sha-1 to identify an object. That would cut the bsearch sha-1
>> table to 1/4 full sha-1 table size.
>
> I don't see the point though.
>
> Why having two tables when only one suffice?
>
> If the argument is about making the SHA1 binary search more efficient
> given a smaller memory footprint, that comes with extra complexity
> coming from the variable length SHA1 strings in the second table.  So
> I'm not sure there is much to gain.
>
> Furthermore, one of the design of pack v4 is to avoid the SHA1 binary
> search entirely.  You will need the binary search only once to locate
> the top commit object, but from there the entire object tree can be
> walked simply by using the sha1ref index and looking up the
> corresponding offset into the pack index file to locate the next object.
>
> Same thing for walking tree objects: the pack v4 tree deltas are meant
> to be walked inline without having to _apply_ any delta.

Hmm.. you are right. We avoid a lot of SHA-1 lookups with .pack v4. A
more compact SHA-1 table only benefits random SHA-1 lookups, which
should be much fewer compared to "rev-list --objects --all".
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]