Brandon Williams <bmwill@xxxxxxxxxx> writes: > On 06/02, Junio C Hamano wrote: >> >> > -static int path_to_sha1(const char *path, unsigned char *sha1) >> > +static int path_to_oid(const char *path, struct object_id *oid) >> > { >> > - char hex_sha1[40]; >> > + char hex_oid[GIT_SHA1_HEXSZ]; >> > int i = 0; >> > - while (*path && i < 40) { >> > + while (*path && i < GIT_SHA1_HEXSZ) { >> > if (*path != '/') >> > - hex_sha1[i++] = *path; >> > + hex_oid[i++] = *path; >> > path++; >> > } >> >> It's no brainer to do s/GIT_SHA1_HEXSZ/GIT_MAX_HEXSZ/ for all of the >> above, but ... > > I'll fix this. > >> >> > - if (*path || i != 40) >> > + if (*path || i != GIT_SHA1_HEXSZ) >> > return -1; >> >> ... this one is tricky. >> >> What's in our envisioned future? Are we expecing to see object >> names, named with two or more hash functions, in a same repository? >> If so, and one is 20 bytes and another one is 32 bytes, then this >> should check 'i' against 40 and 64 and pass if 'i' is one of these >> expected lengths? > > That's a good question, and at this point in time do we have an > envisioned future? From some of our conversations I seem to remember > that we don't want a single repository to have objects based on two > different hash functions, but rather some translation layer to convert > between two hashing functions (for compatibility with other > non-converted repos). Though nothing has been settled upon yet so I > don't know what the future would look like (and the best way to prepare > for it). I do not think we know precisely what we want yet. The "MAX" in the allocation size is an indication that we are allocating for the largest variant possible in the version of Git being compiled, which is a hint that we anticipate that we may have multiple variants with different sizes. And that is where my "against 40 and 64 ... one of these expected lengths" come from. But there is no such set of macros that define acceptable/expected lengths and that tells me that nobody among who defined, reviewed and accepted the MAX macro has thought this part through yet. I see Jonathan started talking about experiments with "different hash" elsewhere; I am reading his message as assuming one hash at a time, and the repository tells us what the hash is. And I think that is one plausible design. If we were to go that route, then - MAX will be the maximum among hashes we can choose at the beginning of a process (perhaps by consulting the repository extension). - In addition to GIT_SHA1_HEXSZ, GIT_SHA2_HEXSZ, and friends, we need a variable that records the length of the hash chosen for the process at the startup, GIT_OIDHASH_HEXSZ. That would become one of the fields in your "struct repository", #define GIT_OIDHASH_HEXSZ (the_repository.oidhash_hexsz) among other things like what the oid for an empty blob is. I'd say in the meantime we can do something like the attached and use it here, perhaps? cache.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/cache.h b/cache.h index 53999418d3..cafb13db9f 100644 --- a/cache.h +++ b/cache.h @@ -70,6 +70,10 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long); #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ +/* The length for the object name hash */ +#define GIT_OIDHASH_RAWSZ GIT_SHA1_RAWSZ +#define GIT_OIDHASH_HEXSZ GIT_SHA1_HEXSZ + struct object_id { unsigned char hash[GIT_MAX_RAWSZ]; };