Re: Git terminology: remote, add, track, stage, etc.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thore Husfeldt <thore.husfeldt@xxxxxxxxx> writes:

> I?ve just learned Git. What a wonderful system, thanks for building
> it. 
> 
> And what an annoying learning experience. 
> 
> I promised myself to try to remember what made it all so hard, and to
> write it down in a comprehensive and possibly even constructive
> fashion. Here it is, for what it?s worth. Read it as the friendly, but
> somewhat exasparated suggestions of a newcomer. I?d love to help (in
> the form of submitting patches to the documentation or CLI responses),
> but I?d like to test the waters first.

Thank you very much for writing those down.  It is very helpful for
us, which are used to Git and know by heart its sometimes obscure
jargon, and might not notice that it is hard to understand.
 

> Remote (tracking) branches
> --------------------------
> 
> There are at least two uses of the word *tracking* in Git's
> terminology.
> 
> The first, used in the form `git tracks a file' (in the sense that Git
> knows about the file) is harmless enough, and is handled under `git
> add` below.

In this sense of "tracked", i.e. "tracked file", it means that given
file is versioned / is under version control.

Though I don't think we use `git tracks a file` anywhere in the
documentation and messages (at least I hope so); we use `tracked file`.
I think it is all right for `tracked file` and `"tracked" branch`
to mean different things.


> But the real monster is the *tracking branch*, sometimes called the
> remote branch, the remote-tracking branch, or the remote tracking
> branch.  Boy did that ever confuse me. [...]
> 
> Please, *please* fix this. It was the single most confusing and
> annoying part of learning Git.
> 
> First, the word, "tracking". These branches don?t track or follow
> anything.  They are standing completely still.  Please believe me that
> when first you are led to believe that origin/master tracks a branch
> on the remote (like a hound tracks it quarry, or a radar tracks a
> flight) that it is very difficult to hunt this misunderstanding down:
> I believed for a long time that the tracking branch stayed in sync,
> automagically, with a synonymous branch at the remote.

But those 'remote-tracking branches' are *used* to track where there
are branches in remote repository.

Sidenote: give thanks that you didn't start to use git before version
1.5.0, when so called "separate remote" layout was made default (which
means tracking branch 'foo' in remote 'origin' using 'origin/foo'
remote-tracking branch).

[...]

> The hyphenated *remote-tracking* is a lot better terminology already
> (and sometimes even used in the documentation), because at least it
> doesn't pretend to be a remote branch (`git branch -r`, of course,
> still does). So that single hyphen already does some good, and should
> be edited for consistency. [...]

The name 'remote-tracking branch' is the name we arrived at after long
discussions not that long time ago, and it is a name that should be
used thorough the documentation.  It is ongoing effort.

> [...] It may be that terminology is slowly converging. (To something
> confusing, but still...)

[...]

> More radically, I am sure some head scratching would be able to find
> useful terminology for master, origin/master, and origin?s master. I?d
> love to see suggestions. As I said, I admire how wonderfully simple
> and clean this has been implemented, and the documentation, CLI, and
> terminology should reflect that.

There is also additional complication that you can have the same
relation that local branch 'master' has to 'origin/master'
remote-tracking branch with two local branches.

We nowadays say that 'origin/master' is "upstream" for 'master'
branch; we used to say that 'master' branch "tracks" 'origin/master'
branch (which can be seen in the name of `--track' option to 
'git branch').
 
> The staging area
> ----------------
> 
> The wonderful and central concept of staging area exists under at
> least three names in Git terminology. And that?s really, really
> annoying. The index, the cache, and the staging area are all the same,
> which is a huge revelation to a newcomer.

This inconsistence is results of historical issues; the concrete
object that is used as mediator betweeb working area and repository
was first called 'dircache', and now is called 'the index'.

There was strong push towards replacing 'index' and 'cache' by
'staging area' (and 'to stage' as verb), but it meets with some
resistance.


> 2. Introduce the alias `git unstage` for `git reset HEAD` in the
> standard distribution.

That is IMHO a very good idea.  The `git unstage <file>` form
describes what we want to achieve (user story), while `git reset HEAD
<file>` requires us to know what operation must we do in order to
remove staged changes from a file.
 
> 3. Duplicate various occurences of `cached` flags as `staged` (and
> change the documentation and man pages accordingly), so as to have,
> e.g., `git diff --staged`.

Note that it is not as easy as it seems at first glance.  There are
*two* such options, which (as you can read in gitcli(7) manpage) have
slightly different meaning:

 * The `--cached` option is used to ask a command that
   usually works on files in the working tree to *only* work
   with the index.  For example, `git grep`, when used
   without a commit to specify from which commit to look for
   strings in, usually works on files in the working tree,
   but with the `--cached` option, it looks for strings in
   the index.

 * The `--index` option is used to ask a command that
   usually works on files in the working tree to *also*
   affect the index.  For example, `git stash apply` usually
   merges changes recorded in a stash to the working tree,
   but with the `--index` option, it also merges changes to
   the index as well.

Some commands like `git apply` support both (though not at the same
time).


> git status
> ----------

[...]
> 2.
>     Untracked files:
>     (use "git add <file>..." to include in what will be committed)
> 
> should be
> 
>     Untracked files:
>     (use "git track <file>" to track)

To "track a file" means to put a file under version control (to
version control the file).

Note also that "git track <file>" would be "git add -N <file>" 
(where `-N` is `--intent-to-add`), which only marks a file to be
tracked / versioned, but doesn't stage its contents.
 
> Adding
> ------
> 
> The tutorial tells us that 
> 
>     Many revision control systems provide an add command that tells
>     the system to start tracking changes to a new file. Git's add
>     command does something simpler and more powerful: git add is used
>     both for new and newly modified files, and in both cases it takes
>     a snapshot of the given files and stages that content in the
>     index, ready for inclusion in the next commit.
> 
> This is true, and once you grok how Git actually works it also makes
> complete sense. `Making the file known to Git' (sometimes called
> `tracking the file') and `staging for the next commit' result in the
> exact same operations, from Git?s perspective.
> 
> But this is a good example of what?s wrong with the way the
> documentation thinks: Git?s implementation perspective should not
> define how concepts are explained. In particular, *tracking* (in the
> sense of making a file known to git) and *staging* are conceptually
> different things.

But they are not independent.  When you stage contents of a file which
was not known to git, it is automatically made "tracked" i.e. put
under version control.  Obvious.

>                    In fact, the two things remain conceptually
> different later on: un-tracking (removing the file from Git?s
> worldview) and un-staging are not the same thing at all, neither
> conceptually nor implementationally. The opposite of staging is `git
> reset HEAD <file>` and the opposite of tracking is -- well, I?m not
> sure, actually. Maybe `git update-index --force-remove <filename>`?

`git rm <filename>` to remove it both from staging area, and working
area, or `git rm --cached <filename>` to remove it only from staging
area, which means that it is removed from version control but kept on
disk.

[...]

> Fixing this requires no change to the implementation. `git stage` is
> already a synonym for `git add`. It merely requires discipline in
> using the terminology of staging. Note that it completely valid to
> tell the reader, maybe immediately and in a footnote, that `git add`
> and `git stage` *are* indeed synonyms, because of Git?s elegant
> model. In fact, given the amount of documentation cruft one can find
> on the Internet, this would be a welcome footnote.
> 
> An even more radical suggestion (which would take all of 20 seconds to
> implement) is to introduce `git track` as another alias for `git
> add`. (See above under `git status`). This would be especially useful
> if tracking *branches* no longer existed.

Well, there is different suggestion: make `git stage`, `git track` and
`git mark-resolved` to be *specializations* of `git add`, with added
safety checks: 'git stage' would work only on files known to git /
under version control already, 'git track' would work only on
untracked files (and do what 'git add -N' does), and 'git mark-resolved'
would work only on files which were part of a merge conflict.
 
> There?s another issue with this, namely that ?added files are
> immediately staged?. In fact, I do understand why Git does that, but
> conceptually it?s pure evil: one of the conceptual conrnerstones of
> Git -- that files can be tracked and changed yet not staged, i.e., the
> staging areas is conceptually a first-class citizen -- is violated
> every time a new file is ?born?. Newborn files are *special* until
> their first commit, and that?s a shame, because the first thing the
> new file (and, vicariously, the new user) experiences is an
> aberration. I admit that I have not thought this through.

-- 
Jakub Narebski
Poland
ShadeHawk on #git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]