Re: [PATCH v2 01/10] Define a structure for object IDs.

Duy Nguyen <pclouds@xxxxxxxxx> · Thu, 12 Mar 2015 18:16:49 +0700

On Thu, Mar 12, 2015 at 5:46 PM, brian m. carlson
<sandals@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Mar 12, 2015 at 11:28:10AM +0100, Michael Haggerty wrote:
>>
>> On 03/12/2015 01:26 AM, Junio C Hamano wrote:
>>>
>>> And that would break the abstraction effort if you start calling the
>>> field with a name that is specific to the underlying hash function.
>>> The caller has to change o->sha1 to o->sha256 instead of keeping
>>> that as o->oid and letting the callee handle the implementation
>>> details when calling
>>>
>>>         if (!hashcmp(o1->oid, o2->oid))
>>>                 ; /* they are the same */
>>>         else
>>>                 ; /* they are different */
>>> [...]
>>
>>
>> Hmm, I guess you imagine that we might sometimes pack SHA-1s, sometimes
>> SHA-256s (or whatever) in the "oid" field, which would be dimensioned
>> large enough for either one (with, say, SHA-1s padded with zeros).
>>
>> I was imagining that this would evolve into a union (or maybe struct) of
>> different hash types, like
>>
>>    struct object_id {
>>        unsigned char hash_type;
>>        union {
>>            unsigned char sha1[GIT_SHA1_RAWSZ];
>>            unsigned char sha256[GIT_SHA256_RAWSZ];
>>        } hash;
>>    };
>>
>> BTW in either case, any hopes of mapping object_id objects directly on
>> top of buffer memory would disappear.
>
>
> What I think might be more beneficial is to make the hash function
> specific to a particular repository, and therefore you could maintain
> something like this:
>
>  struct object_id {
>      unsigned char hash[GIT_MAX_RAWSZ];
>  };
>
> and make hash_type (or hash_length) a global[0].  I don't think it's
> very worthwhile to try to mix two different hash functions in the same
> repository,

This may or may not fall into the "mix different hash functions"
category. In pack files version 4, trees are encoded to point to other
trees or blobs by a (pack, offset) tuple. It would be great if the new
object_id could support carrying this kind of object id around because
it could help reduce object lookup cost a lot. (pack, offset) can be
converted back to SHA-1 so no info is lost and hashcmp() can compare
(pack, tuple) against an SHA-1 just fine.

> so we could still map directly onto buffer memory if we
> decide that's portable enough.  I expect the cases where we need to do
> that will be relatively limited.
>
> Regardless, it seems that this solution has the most support (including
> Junio's) and it's more self-documenting than my current set of patches,
> so I'm going to go with it for now.  It should be easy to change if the
> consensus goes back the other way.
>
> [0] I personally think globals are a bit gross, but they don't seem to
> have the problems that they would if git were a shared library.
>
> --
> brian m. carlson / brian with sandals: Houston, Texas, US
> +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
> OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html