Re: Lets avoid the SHA-1 term (was [doc] User Manual Suggestion)

Michael J Gruber <git@xxxxxxxxxxxxxxxxxxxx> · Mon, 27 Apr 2009 14:06:25 +0200

Felipe Contreras venit, vidit, dixit 27.04.2009 01:38:
> 2009/4/27 Björn Steinbrink <B.Steinbrink@xxxxxx>:
>> On 2009.04.24 20:48:57 -0400, David Abrahams wrote:
>>>
>>> On Apr 24, 2009, at 8:01 PM, Michael Witten wrote:
>>>
>>>>> What's wrong with just calling the object name "object name"?
>>>>
>>>> What's wrong with calling the object address "object address"?
>>>
>>> Neither captures the connection to the object's contents.  I think
>>> "value ID" would be closer, but it's probably too horrible.
>>
>> I think I asked this in another mail, but I'm quite tired, so just to
>> make sure: What do you mean by "value"? I might be weird (I'm not a
>> native speaker, so I probably make funny and wrong connotations from
>> time to time), but while I can accept "content" to include the type and
>> size of the object, the term "value" makes me want to exclude those
>> pieces of meta data. So "value" somehow feels wrong to me, as the hash
>> covers those two fields.
> 
> Just to summarize.
> 
> Do you agree that SHA-1 is not the proper term to choose?
> 
> Do you agree that either 'id' or 'hash' would work fine?
> 
> Personally I think there's an advantage of choosing 'hash'; if we pick
> 'id' then the user might think that he can change the contents of the
> object while keeping the same id, if we pick 'hash' then it's obvious
> the 'id' is tied to the content and why.
> 

Apparently a branch of that thread touched the "[PATCH 0/2] Unify use of
[sha,SHA][,-]1", so I'll do a cc merge, feeling entitled to summarize
the latter:

- There are two SHA-1ish things we talk about: the SHA-1 hash
algorithm/function on the one hand and git object names on the other hand.

- The object name of a file is not the SHA-1 checksum of its contents:
That's more or less obvious because there are no files in git, only
objects. The object name is the SHA-1 of a representation of an object
(which, for blobs, consists of header + content).

- There seemed to be an implicit claim that the Doc uses SHA-1 for the
algorithm and sha1/SHA1 for the object name. That's not founded by facts
(see below) and is not practical.

- The glossary defines SHA1 to be equivalent to the object name and does
not mention any other spelling.

The stats (line counts for simplicity) and facts for Documentation/ are:

SHA-1: 56
Used exclusively for the object name.

SHA1: 73
Used mostly for the object name, but also for the patch-id (SHA-1
checksum of patch), in the tutorial, and pack-format, i.e. in places
where the actual hash algorithm/function is mentioned.

sha1: 102
Used all over the place, mostly for the object name and when quoting
from the source. I don't think it's used for the hash algorithm/function.

sha-1: 0

So, the current confusion is mostly due to the fact that 3 different
names are used for the same thing (object name) and to a much lesser
degree to the fact that the same name (SHA1) is used for 2 different
things (hash algorithm/function vs. object name).

My patch tried to lessen the confusion by naming one thing by 1 name
only (SHA-1). It continued the tradition of identifying the object name
with the hash algorithm which is used in forming that name. I don't
think it matters much (confusion-wise) which one we choose from those 3,
it would be easy to rewrite the patch to use SHA1 or sha1 instead of
SHA-1 (and I'd be willing to), but consistently so.

An alternative patch would substitute most occurrences of the above by
X, X being the future term for "object name" to be agreed upon, and go
for say SHA-1 at the very few places where the actual algorithm is
mentioned. I just don't want to bet on that agreement and patch happening.

Michael
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html