Re: Definition of "the Git repository"

Igor Djordjevic <igor.d.djordjevic@xxxxxxxxx> · Fri, 25 Jun 2021 10:56:28 +0200

On 25/06/2021 03:44, Kevin Buckley wrote:
> 
> raising this on the back of a discussion over at the Software
> Carpentry lesson about Git,
> 
>    https://github.com/swcarpentry/git-novice/issues/810
> 
> I used the book to justify my claim that it is the .git directory
> that is the repository, but I do have to concede that the way that
> the text in section 2.1 of the book reads, does suggest that one
> can refer to the working directory PLUS the .git directory as a
> "repository" as well as being able to refer to the .git directory
> alone as the "repository".
> 
> In the way I think of it
> 
> git init
> 
> initialises a Git repository, however, the only thing that changes
> as a result is that a .git directory has been created, ergo, the
> .git directory is the repository.
> 
> Furthermore, the fact that one can take the .git directory, move it
> to a new directory and start using it there (very much a nice feature)
> also suggests to me that it is the .git directory that is the repository,
> as distict from a working directory, under Git control because of the
> existence of a repository within it.
> 
> Interested to hear any thoughts around the semantics here,

Thinking out loud, and without discussing various places where "Git 
repository" might be described one way or the other, a "repository" 
is a place where *something* is *stored*.

"Source code repository" would thus be a place where source code is 
stored, possibly with some metadata (current version, last change, 
etc.), but not necessarily the whole (versioned) history. For storage 
purpose alone, Git's own working tree could be then considered a 
repository in its own right (source code repository, if it contains 
source code, but it could contain other stuff as well, in addition or 
standalone). But as soon as you start working in it it's not really 
(only) a storage anymore (so not a repository), but a working area. 
It's more of a conceptual thing.

So if we strictly speak of "Git repository", I think it should be a 
place where Git keeps (stores) your (committed) work, alongside its 
own (meta)data - and that is the ".git" directory, indeed. Seems 
simple enough :)

One place where the confusion might be extended is the notion of 
"bare repository" for ".git" directory alone (without the working 
tree), which should then imply ".git" + working tree is in fact 
"a repository"... which it is, but bare with me - pun intended :)

As Git is mainly used to version artifacts being more or less actively 
worked on (changing, that is), one needs a working area in order to 
do the actual work, thus we have a working tree happily and conveniently 
provided by Git by default, as part of *working with* a Git repository.

As you said, having ".git" directory alone is enough to recreate the 
contents of the working tree, where if you would have the working tree 
alone, even if that could be considered to be "a repository" (for 
storage, not work), you would definitely not have "a Git repository" 
(no ".git" directory).

Also, when you work with a remote Git repository, it's only the 
committed stuff you can work with - what's inside ".git". You have no 
idea of contents of a working tree (and ideally not knowing if one 
even exists, though that's not always the case, like if you try to 
push to a checked out branch).

For some additional understanding, I guess we can compare "repository" 
with "archive", possibly being a more familiar concept - you can 
store source code somewhere, and that's your archive. You don't work 
on it in there, as then it would not be an archive anymore, but you 
keep it as a backup which you can always retrieve if needed.

If you use ZIP to compress your archive, it's then a "ZIP archive". 
And for the most of the time these two would in fact be interchangeable 
- you can have one or the other, being able to recreate one from the 
other (unlike with ".git" and working tree).

BUT, if you add some additional (meta)data to your ZIP archive - like 
password, description, etc. - then the two are not interchangeable 
anymore, "ZIP archive" not being the same as "archive", not being able 
to be recreated from it.

To conclude - Git's concept of a "working tree" alone could be "a 
repository" (used for storage, not working), but it's not "a Git 
repository" without the ".git" directory in it. On the other hand, 
while "Git repository" must have a ".git" directory, it can have a 
"working tree", too - but it doesn't need to (called "bare [Git] 
repository" in such a case), as it can be completely recreated from 
".git" directory alone, being a mere convenience in order to be able 
to do the actual (development) work (and not required for "Git 
repository" to be a "repository").

Finally, as "Git repository" could be referred to as only a 
"repository" for brevity (which it is, in general), it's important to 
notice the latter might be ambiguous (as it does not imply "Git 
repository" in particular), thus using "Git repository" when being in 
the clear is paramount, indicating that you're interested in ".git" 
directory precisely (and possibly, but not necessarily the working 
tree as well). If interested in ".git" directory alone, "bare Git 
repository" is the most precise term.

Regards, Buga