Re: Consistent terminology: cached/staged/index

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 13 Feb 2011 14:58:32 -0800

Jonathan Nieder <jrnieder@xxxxxxxxx> writes:

> If I understand correctly, the intended semantics are:
>
> --index versus --cached
> ~~~~~~~~~~~~~~~~~~~~~~~
> The place where changes for the next commit get registered is called
> the "index file".
>
> Commands that pay attention to the registered content of files rather
> than the copies in the work tree use the option name "--cached".  This
> is mostly for historical reasons --- early on, it was not obvious that
> making the index not match the worktree was going to be useful.
>
> Commands that update the registered content of files in addition to
> the worktree use the option name "--index".

Mostly correct, except the "early on, it was not obvious" part.  It was
very obvious from the early days that unlike "cvs commit" or "svn commit"
it was very useful that you can trust "git commit", after preparing the
index with what is and isn't to be included in the commit, won't pick up
debugging cruft you keep around in the working tree.

"cache" was an old name (and still established name in-use in the code)
for the index.  Some commands make sense to affect both the index and the
working tree (e.g. "apply") and you give --index to mean "both index and
the working tree" while some other operating modes that make sense only to
look at the index, ignoring the potential difference between the working
tree and the index (e.g. again "apply"), iow, taking only the cached
changes into account, are invoked with --cached to mean "look only at what
is recorded in the index".

Some people may find it a good idea to introduce new synonyms --index-only
vs --index-and-working-tree. I personally am not opposed to such a change,
as long as traditional --cached vs --index will keep working for people
who already learned the difference.  These hypothetical new synonyms would
be more descriptive; the necessity to differenciate the two concepts the
two options --cached vs --index try to tell apart is very real, but it was
a hack to use these two particular words --cached vs --index to do so
without trying harder to come up with better words.

> --staged
> ~~~~~~~~
> diff takes --staged, but that is only to support some people's habits.

This one actually needs more historical background to understand why it is
there, as the synonym is not necessary to understand how git works.

Originally, the way to say "what is in the current working tree for this
path is what I want to have in the next commit" was "update-index".  "What
I want to have in the next commit" is "the index", and the operation is
about "updating" that "What I want to have...", so the name of the command
made perfect sense.  "update-index" had a safety valve to prevent careless
invocation of "update-index *" to add all the cruft in the working tree
(there wasn't any .gitignore mechanism in the Porcelain nor in the
plumbing) and by default affected only the paths that are already in the
index.  You needed to say "update-index --add" to include paths that are
not in the index.

A more user friendly Porcelain "git add" was later implemented in terms of
"update-index --add", but originally it was to add new paths; updating the
contents was still done via "update-index" interface.

This changed in v1.5.0, around the beginning of 2007.  Nicolas Pitre among
others realized that git is about tracking contents, not paths, which
meant that "make the content in the working tree at this moment appear in
the next commit" is equivalent to saying "add this _content_ to the set of
contents that make up the next commit".  "git add" learned to accept both
new paths that were not in the index so far and also paths known to the
index that had old contents for them.

Before v1.5.0, we explained the concept as "we update the set of contents
to be in the next commit" (hence "update-index"); since v1.5.0, we explain
the concept as "we add what's in these paths to the set of contents to be
in the next commit" (hence "add").

Notice that there is no need for a new terminology "staged" in the above
description?

The semantics of the index didn't change ever since, modulo small tweaks
like "add -i" (I borrowed it from Darcs) that allows us to say "add parts
of the changed content" instead of the "what's in the file as a whole
right now" were added; these small tweaks didn't introduce any conceptual
change.

The term "stage" comes from "staging area", a term people used to explain
the concept of the index by saying "The index holds set of contents to be
made into the next commit; it is _like_ the staging area".

My feeling is that "to stage" is primarily used, outside "git" circle, as
a logistics term.  If you find it easier to visualize the concept of the
index with "staging area" ("an area where troops and equipment in transit
are assembled before a military operation", you may find it easier to say
"stage this path ('git add path')", instead of "adding to the set of
contents...".

Although I tried to use the word myself in earlier days, I have never felt
that "staging area" is a very widely known term for non-native speakers of
English, and personally have tended to avoid using it.  I find "adding to
the set of contents..." somewhat easier to understand regardless of your
language background, but it may be just me who is not a native speaker.

In short, "stage" is an unessential synonym that came much later, and that
is why we avoid advertising it even in the document of "git diff" too
heavily.  Unlike the hypothetical --index-only synonym for --cached I
mentioned earlier that adds real value by being more descriptive, "staged"
does not add much value over what it tried to replace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html