Re: Git terminology: remote, add, track, stage, etc.

Matthieu Moy <Matthieu.Moy@xxxxxxxxxxxxxxx> · Mon, 18 Oct 2010 23:41:21 +0200

Thore Husfeldt <thore.husfeldt@xxxxxxxxx> writes:

> Read it as the friendly, but
> somewhat exasparated suggestions of a newcomer. Iâd love to help (in
> the form of submitting patches to the documentation or CLI responses),
> but Iâd like to test the waters first.

(it's common practice here to test the water with RFC/PATCHes too)

> There are at least two uses of the word *tracking* in Git's
> terminology.

Actually, there's a third, known to be rather unfortunate.

For example, when you clone a repository, by default, you end up with

1) The master branch hosted remotely
2) origin/master, locally, but "remote-tracking"
3) master, your working branch.

When you do a "git pull" when sitting on local branch master, Git
knows it must :

a) fetch (i.e. download) from branch 1) into branch 2)
b) merge from branch 2) into branch 1)

Rule a) come from remote.<remotename>.fetch, and rule b) comes from
branch.master.merge in your .git/config.

Usually, we refer to tracking branch to mean rule a), but the "track"
in "git branch --track" means "setup git for rule b) above".

We already came up with a better wording, namely "upstream", and used
in in "git push --set-upstream". Probably a next step would be to
deprecate any other occurence of --track meaning the same thing (git
checkout --track seems to me to be a candidate, git branch has both
--track and --set-upstream). One difficulty is to do that with
backward compatibility in mind.

> 3. Duplicate various occurences of `cached` flags as `staged` (and
> change the documentation and man pages accordingly), so as to have,
> e.g., `git diff --staged`.

I do like this, but to be complete, one should also deal with more
complex cases. For example, "git apply" has _both_ --index and
--cached, with different semantics.

And changing just _some_ of the occurences of --index and --cached may
help, but do not fix the problem of inconsistancies. Up to now, there
have been many efforts towards consistancy, but I guess no one had the
courrage of doing a global-enough approach to eliminate all
inconsistancies.

In other words, I encourage you to continue the effort you've stated
here, but that won't help much unless you push the idea far enough
IMHO.

>     changed but not updated:
>
> Iâm still not sure what âupdateâ was ever supposed to mean in this
> sentence.

Historically, the staging area was seen as a cache (hence the name),
which was purposedly out-of-date when doing a partial commit. Hence,
Git inherited some of the terminology of usual caches (a cache is
"dirty" when it's not in sync with what it caches, "clean" when it is,
and you "update" it to make it in sync).

But I do agree that the analogy with a cache is disturbing for the
user, even if it's meaningful for the developper: as a user, a cache
is meant to be a performance optimization, not supposed to interfer
with the functionality.

> 2.
>     Untracked files:
>     (use "git add <file>..." to include in what will be committed)
>
> should be
>
>     Untracked files:
>     (use "git track <file>" to track)

This hypothetical "git track" actually exists under the name "git add
-N".

> The opposite of staging is `git
> reset HEAD <file>` and the opposite of tracking is -- well, Iâm not
> sure, actually. Maybe `git update-index --force-remove <filename>`?

git rm --cached ?

As a bare mortal, you shouldn't need update-index, it's a plumbing
command (i.e. meant for scripts or low-level manipulations).

> An even more radical suggestion (which would take all of 20 seconds to
> implement) is to introduce `git track` as another alias for `git
> add`. (See above under `git status`). This would be especially useful
> if tracking *branches* no longer existed.

I disagree that adding aliases would help users. See your confusion,
and then the relief when you found out that index, cache, and staging
area were synonymous. Now, what should a user think after learning
stage, track and add, and asking for the difference.

I agree that adding new files and adding new content to existing files
are done for different reasons, but the conceptual simplicity of Git
comes from the fact that Git is purely snapshot oriented, and I to
some extent, it's nice to have this reflected in the user-interface.

When you say "git add X", you don't talk about the difference between
the previous commit and the next, or about the difference between
working tree and next commit, or so. You're basically saying "file X
will exist in the next commit, and it will have this content". Whether
it existed or not in the previous commit doesn't matter. It's
implemented this way, and it's really something fundamental in the Git
model.

> Thereâs another issue with this, namely that âadded files are
> immediately stagedâ. In fact, I do understand why Git does that, but
> conceptually itâs pure evil: one of the conceptual conrnerstones of
> Git -- that files can be tracked and changed yet not staged,

Rephrase that as "the working tree can have content different from the
staged content". Both "working tree content" and "staged content" are
snapshot (i.e. they exist regardless of each other). Then newly
created files won't be different anymore. Files exist, with some
(possibly empty) content, or they don't.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html