Re: [PATCH 25/33] notes-merge: convert verify_notes_filepair to struct object_id

Junio C Hamano <gitster@xxxxxxxxx> · Sat, 03 Jun 2017 08:49:39 +0900

Brandon Williams <bmwill@xxxxxxxxxx> writes:

> On 06/02, Junio C Hamano wrote:
>> 
>> > -static int path_to_sha1(const char *path, unsigned char *sha1)
>> > +static int path_to_oid(const char *path, struct object_id *oid)
>> >  {
>> > -	char hex_sha1[40];
>> > +	char hex_oid[GIT_SHA1_HEXSZ];
>> >  	int i = 0;
>> > -	while (*path && i < 40) {
>> > +	while (*path && i < GIT_SHA1_HEXSZ) {
>> >  		if (*path != '/')
>> > -			hex_sha1[i++] = *path;
>> > +			hex_oid[i++] = *path;
>> >  		path++;
>> >  	}
>> 
>> It's no brainer to do s/GIT_SHA1_HEXSZ/GIT_MAX_HEXSZ/ for all of the
>> above, but ...
>
> I'll fix this.
>
>> 
>> > -	if (*path || i != 40)
>> > +	if (*path || i != GIT_SHA1_HEXSZ)
>> >  		return -1;
>> 
>> ... this one is tricky.  
>> 
>> What's in our envisioned future?  Are we expecing to see object
>> names, named with two or more hash functions, in a same repository?
>> If so, and one is 20 bytes and another one is 32 bytes, then this
>> should check 'i' against 40 and 64 and pass if 'i' is one of these
>> expected lengths?
>
> That's a good question, and at this point in time do we have an
> envisioned future?  From some of our conversations I seem to remember
> that we don't want a single repository to have objects based on two
> different hash functions, but rather some translation layer to convert
> between two hashing functions (for compatibility with other
> non-converted repos).  Though nothing has been settled upon yet so I
> don't know what the future would look like (and the best way to prepare
> for it).

I do not think we know precisely what we want yet.  The "MAX" in the
allocation size is an indication that we are allocating for the
largest variant possible in the version of Git being compiled, which
is a hint that we anticipate that we may have multiple variants with
different sizes.  And that is where my "against 40 and 64 ... one of
these expected lengths" come from.  

But there is no such set of macros that define acceptable/expected
lengths and that tells me that nobody among who defined, reviewed
and accepted the MAX macro has thought this part through yet.

I see Jonathan started talking about experiments with "different
hash" elsewhere; I am reading his message as assuming one hash at a
time, and the repository tells us what the hash is.  And I think
that is one plausible design.

If we were to go that route, then

    - MAX will be the maximum among hashes we can choose at the
      beginning of a process (perhaps by consulting the repository
      extension).

    - In addition to GIT_SHA1_HEXSZ, GIT_SHA2_HEXSZ, and friends, we
      need a variable that records the length of the hash chosen for
      the process at the startup, GIT_OIDHASH_HEXSZ.  That would
      become one of the fields in your "struct repository",

        #define GIT_OIDHASH_HEXSZ (the_repository.oidhash_hexsz)

      among other things like what the oid for an empty blob is.

I'd say in the meantime we can do something like the attached and
use it here, perhaps?

 cache.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/cache.h b/cache.h
index 53999418d3..cafb13db9f 100644
--- a/cache.h
+++ b/cache.h
@@ -70,6 +70,10 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
 
+/* The length for the object name hash */
+#define GIT_OIDHASH_RAWSZ GIT_SHA1_RAWSZ
+#define GIT_OIDHASH_HEXSZ GIT_SHA1_HEXSZ
+
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
 };