Re: [PATCH v4] technical doc: add a design doc for hash function transition

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 02 Oct 2017 17:25:15 +0900

Jonathan Nieder <jrnieder@xxxxxxxxx> writes:

>>> +6. Skip fetching some submodules of a project into a NewHash
>>> +   repository. (This also depends on NewHash support in Git
>>> +   protocol.)
>>
>> It is unclear what this means.  Around submodule support, one thing
>> I can think of is that a NewHash tree in a superproject would record
>> a gitlink that is a NewHash commit object name in it, therefore it
>> cannot refer to an unconverted SHA-1 submodule repository.  But it
>> is unclear if the above description refers to the same issue, or
>> something else.
>
> It refers to that issue.

We may want to find a way to make it clear, then.

>> It makes me wonder if we want to add the hashname in this object
>> header.  "length" would be different for non-blob objects anyway,
>> and it is not "compat metadata" we want to avoid baked in, yet it
>> would help diagnose a mistake of attempting to use a "mixed" objects
>> in a single repository.  Not a big issue, though.
>
> Do you mean that adding the hashname into the computation that
> produces the object name would help in some use case?

What I mean is that for SHA-1 objects we keep the object header to
be "<type> <length> NUL".  For objects in newer world, use the
object header to "<type> <hash> <length> NUL", and include the
hashname in the object name computation.

> For loose objects, it would be nice to name the hash in the file, so
> that "file" can understand what is happening if someone accidentally
> mixes types using "cp".  The only downside is losing the ability to
> copy blobs (which have the same content despite being named using
> different hashes) between repositories after determining their new
> names.  That doesn't seem like a strong downside --- it's pretty
> harmless to include the hash type in loose object files, too.  I think
> I would prefer this to be a "magic number" instead of part of the
> zlib-deflated payload, since this way "file" can discover it more
> easily.

Yeah, thanks for doing pros-and-cons for me ;-)

>> If it is a goal to eventually be able to lose SHA-1 compatibility
>> metadata from the objects, then we might want to remove SHA-1 based
>> signature bits (e.g. PGP trailer in signed tag, gpgsig header in the
>> commit object) from NewHash contents, and instead have them stored
>> in a side "metadata" table, only to be used while converting back.
>> I dunno if that is desirable.
>
> I don't consider that desirable.

Agreed.  Let's not go there.

>> Hmm, as the corresponding packfile stores object data only in
>> NewHash content format, it is somewhat curious that this table that
>> stores CRC32 of the data appears in the "Tables for each object
>> format" section, as they would be identical, no?  Unless I am
>> grossly misleading the spec, the checksum should either go outside
>> the "Tables for each object format" section but still in .idx, or
>> should be eliminated and become part of the packdata stream instead,
>> perhaps?
>
> It's actually only present for the first object format.  Will find a
> better way to describe this.

I see.  One way to do so is to have it upfront before the "after
this point, these tables repeat for each of the hashes" part of the
file.

>> Oy.  So we can go from a short prefix to the pack location by first
>> finding it via binsearch in the short-name table, realize that it is
>> nth object in the object name order, and consulting this table.
>> When we know the pack-order of an object, there is no direct way to
>> go to its location (short of reversing the name-order-to-pack-order
>> table)?
>
> An earlier version of the design also had a pack-order-to-pack-offset
> table, but we weren't able to think of any cases where that would be
> used without also looking up the object name that can be used to
> verify the integrity of the inflated object.

The primary thing I was interested in knowing was if we tried to
think of any case where it may be useful and then didn't think of
any---I couldn't but I know I am not imaginative enough, and I
wanted to know you guys didn't, either.