Re: [doc] User Manual Suggestion

Björn Steinbrink <B.Steinbrink@xxxxxx> · Sun, 26 Apr 2009 20:12:44 +0200

On 2009.04.26 11:36:04 -0500, Michael Witten wrote:
> 2009/4/26 Björn Steinbrink <B.Steinbrink@xxxxxx>:
> > On 2009.04.25 15:36:24 -0400, David Abrahams wrote:
> >> Where it's relevant when the user notices that two distinct files have
> >> the same id (because they happen to have the same contents) and wonders
> >> what's up.
> >...
> > And why would your implementation save the same object twice, in two
> > distinct files?
> 
> This question makes me think that you don't understand the parent's
> point. He's not talking about implementation details; in fact, there's
> no reason to mix the git world and the file system world at all in
> this discussion.
> 
> David is pointing out that a user might notice that two different
> trees list the same blob. This can be startling if you have incomplete
> picture about what's going on.

David said that the user encounters two distinct files with the same id.
The ids are properties of the objects. So he must have meant object
files, or he attributed the id to the wrong thing. I assumed that he
didn't mix those things up and really meant the object files, thus my
reply.

> >From a practical point of view, you might argue that not too many
> people are looking at trees and blobs;

Heh, I'd rather argue that too _few_ people have looked at commits and
trees at least once, whether it's an actual object or a graph like in
git for computer scientists.

> however, it seems to me that most people are afraid to use any of
> git's most useful features precisely because they don't understand the
> git model and they don't understand that nothing is ever lost unless
> you explicitly clean up unreferenced objects---they don't see how easy
> it is manipulate their repos. I argue that if they are given the full
> knowledge of git's concepts, then they will be able to reason about
> their repo actions with confidence, even if they only work with
> commits.

Agreed.

> I think the key is to stress in the documentation the idea that there
> are 2 separate worlds (the git object world and the working
> directory's file system world) and that the git tools provide an
> interface between them; this seems like a small and unnecessarily
> academic point, but I believe that it's important to working with
> confidence.

Agreed. That's also why I asked David why the user would look at the
object files in the repo (the .git dir). To some degree those are also
an implementation detail. The user works with the working tree and uses
the git tools to modify the repo.

> > It's an identity relation: same name/id => same object. Unlike e.g. a
> > hash-table where you are expected to deal with collisions, and having
> > the same hash doesn't mean that you have identical data.
> 
> However, having the same *cryptographic* hash does mean that you have
> identical data.

That's the _assumption_ that git makes. Hash collisions are always
possible, just hard to create intentionally when the hash function has
not yet been broken.

Björn
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html