Re: Definition of "the Git repository"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kevin,

On 25/06/2021 02:44, Kevin Buckley wrote:
> Hi there,
>
> raising this on the back of a discussion over at the Software
> Carpentry lesson about Git,
>
>    https://github.com/swcarpentry/git-novice/issues/810
>
> I used the book to justify my claim that it is the .git directory
> that is the repository, but I do have to concede that the way that
> the text in section 2.1 of the book reads, does suggest that one
> can refer to the working directory PLUS the .git directory as a
> "repository" as well as being able to refer to the .git directory
> alone as the "repository".
>
> In the way I think of it
>
> git init
>
> initialises a Git repository, however, the only thing that changes
> as a result is that a .git directory has been created, ergo, the
> .git directory is the repository.
>
> Furthermore, the fact that one can take the .git directory, move it
> to a new directory and start using it there (very much a nice feature)
> also suggests to me that it is the .git directory that is the repository,
> as distict from a working directory, under Git control because of the
> existence of a repository within it.
>
> Interested to hear any thoughts around the semantics here,

In general, the Git semantics are confusing.

There is the generic, the conceptual and the implementation, all of the
same term which has to be understood in context (which again uses terms
with the same multi-way context..). This leapfrogging from concept to
implementation and back again causes lots of learner confusion.

You have already seen that a source directory can become a repository by
being initialised, and that the primary artefacts are in the .git
sub-directory. One can also include in the generic 'repository' the
various special .git* files that are [user] added to the main source
directory.

But it gets worse. In the .git directory there is the 'objects'
directory which actually holds all the _content_ of the 'repository',
each object named by its hash value. This object store is a superset of
those objects that form the versioned repository structure (the 'DAG'),
and other parts of Git, such as the staging area (Index) contents, and
other temporary copies of 'stuff'.

Meanwhile, to make the repository structure work, there are 'branch'
pointers (see 'ref/heads' [0]) to the specific object hashes that
provide the _starting point_ for each branch's linked list of commits
and their content.

The use of the write-once unique object hash names [1] is part of that
implementation 'trickery' that allows Git to work so well, but helps
confuse those who think the object store is synonymous with the
repository (e.g. all the new learners..).

In summary, everyone is right, as long as they are clear about the
context. Which they rarely are.

Philip

[0] the 'ref name's are just strings that conveniently look just like
posix paths ...
[1] the current hash is the 160 bit sha1, so roughly 1 : 10^24  chance
of a collision - unique enough;-)



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux