Re: [doc] User Manual Suggestion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Apr 26, 2009, at 7:28 AM, Björn Steinbrink wrote:

On 2009.04.25 15:36:24 -0400, David Abrahams wrote:
Where it's relevant when the user notices that two distinct files have the same id (because they happen to have the same contents) and wonders
what's up.

Why would the user have to care about the object files in the repo?

What a strange question. I have no idea how to answer. It seems self- evident to me that users of a VCS care that their files are stored in it.

And
why would your implementation save the same object twice, in two
distinct files?

One could easily have the expectation that contents can be duplicated because there are numerous precedents in everyone's experience of computing, for example in filesystems and in any programming language that is not pure-functional.

The SHA-1 hash is created from the object, that means
the its type, size and data. It's not an id of a file in the working
tree, but of an object

All true. All somewhat subtle distinctions that are not nearly as apparent unless you actually use the word "hash" as I have been advocating.

It's not a foregone conclusion that objects with the same value have
identical ids, but it's immediately apparent if the id is known to be a
hash.

You can't have two objects with the same contents to begin with, same
content => same object.

In the Git world, I agree. In general, I disagree. The fact that is so in the Git world is reinforced by the notion that the id of an object is a hash of its contents.

You can just have that one object stored
multiple times in different places (for sane implementations this likely means that you have more than one repo to look at, and each has its own
copy of that object, but that's nothing you as an user should have to
care about).

It's an identity relation: same name/id => same object. Unlike e.g. a
hash-table where you are expected to deal with collisions, and having
the same hash doesn't mean that you have identical data. But that's not
true of git, it expects an identity relation, which is IMHO better
expressed through "object name" or "object id".

Yes, that's true in the Git world (though not necessarily elsewhere), or at least you hope it is. In fact, there's no guarantee that SHA1 collisions won't occur; it's just exremely unlikely. In fact, if you google it you can find some interesting papers about SHA1 collision.

Another way to express what you wrote above:

   same same id => same hash ?=> same contents => same object

where ?=> means "almost certainly implies." What you left out was the implication in the other direction, which is a true guarantee at all steps, and "hash" is well-understood to mean

   same contents => same hash

You can still say that
the name/id is generated by using a hash function, but the important
part is that the name/id is used to _uniquely_ identify an object, which
isn't apparent when you call it a hash.


I think the implication is important in both directions. Neither one is self-evident to a new user. Maybe the right answer is 'hash id'.

--
David Abrahams
BoostPro Computing
http://boostpro.com




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]