Re: Typesafer git hash patch

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> · Tue, 28 Feb 2017 20:19:19 +0000

On Tue, Feb 28, 2017 at 11:53:28AM -0800, Junio C Hamano wrote:
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
> 
> > That
> > actually can clean up some code, because right now we have duplicate
> > interfaces for some things that take an oid pointer and others take a
> > "const unsigned char *sha1", and that duplication essentially can go
> > away.
> 
> ... I understand that this is an eventual goal of "struct object_id"
> effort.

Yes, it is.

> > ..., and the lines are generally shorter, eg
> >
> > -               const unsigned char *mb = result->item->object.oid.hash;
> > -               if (!hashcmp(mb, current_bad_oid->hash)) {
> >
> > turns into
> >
> > +               const hash_t *mb = &result->item->object.oid;
> > +               if (!hashcmp(mb, current_bad_oid)) {
> 
> Hmph.  I somehow thought the longer term directio for the above code
> would be to turn it into
> 
> 		if (!oidcmp(&result->item->object.oid, &current_bad_oid))
> 
> instead.

It is.  The duplication is temporary.  We've actually removed some of
the original SHA-1 functions in favor of the object_id versions.

> Having said all that, I do not offhand see a huge benefit of the
> current layout that has one layer between the hash (i.e. oid.hash)
> and the object name (i.e. oid) over "there is no need for oid.hash;
> oid is just a hash", which you seem to be doing.
> 
> Except for cases where we need to be prepared to see two or more
> kinds of object names (in which case struct object_id would have an
> enum that tells you if oid.hash is SHA-1 or SHA-3-256), that is.  Of
> course we can encode that in hash_t itself (e.g. multihash).

Right, this is indeed a benefit.

The bigger issue is the assumptions in the code base that assume a given
hash size.  These assumptions are in a bunch of places, so pretty much
any time you see a number, you end up having to question what it means.
We have a 64 that means 40 (SHA-1 hex) plus 24 other values, for
example.

I've sent various series because I think that's the way that the Git
development community has most welcomed my contributions.  I have
another 69 patches in two series (which are as of yet untested).  I can
probably have almost all of the transition actually complete in another
couple weeks (because I have a day job which does not involve working on
Git).

I've hesitated to send additional patches to the list when my existing
changes haven't even hit next yet, since they're received little actual
testing.

I'm unsure if sending one enormous patch is actually going to be
welcome; I feel Junio and most of the regulars are probably not going to
be very excited about the idea, mostly because the reviewing task is
enormous, and the risk for breakage is also pretty high.  We want to
move away from SHA-1 as quickly as possible, yes, but we don't want Git
to become unusably buggy.

I guess what I'm saying is that I'm fine if we want to have a more
aggressive timeline, and I can probably make that happen, at least if we
can wait a few weeks.  If Junio and the other regulars are comfortable
merging one omnibus patch to get it moving even more quickly, I won't
stand in the way.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204
Attachment:
signature.asc

Description: PGP signature