[RFC] Introduce "git stage" (along with some heresy)

Carl Worth <cworth@xxxxxxxxxx> · Fri, 01 Dec 2006 09:36:00 -0800

[This message, (yes,another long one from me), proposes 3 changes. The
first should be uncontroversial I think, while the second and third
are clear heresy, (and the second would require some amount of
re-training or re-configuration by existing git user). Pick and choose
as you see fit. I don't think they actually depend on each other,
though I'll present them here as parts of a whole.]

Change #1: Add "git stage" command, use "--staged" instead of "--index"
=======================================================================
If we're going to start describing the index as a "staging area" let's
make the command set reflect that as well. I propose a new "git stage"
command that is intended for human use when wanting to do a staged
commit.

Then, a few other commands that currently have --index or --cached
arguments could switch to --staged as well.

With this change here is a summary of some of the primary git commands
(that are relevant to the current discussion):

add		Shove a file's contents into git's staging area

stage		Shove a file's contents into git's staging area

rm		Remove a file from git's staging area

diff		Show what's changed in working tree compared to
		staging area

diff --staged	Show what's changed in staging area compared to latest
		commit

commit		Create a new commit from the contents of the staging area

commit -a	Update the contents of all files in the staging area,
		and create a new commit from the new staging area

commit files...	Create a new commit that differs from the latest
		commit only in files... (which get new content from
		the current working tree). Staged content of other
		files (if any) will not be committed.

I hope that so far (in this email) I haven't said anything very
contentious. This is basically just a summary of the existing behavior
with things like "update-index" and "--cached" changed to "stage" and
"--staged".

The introduction of this new "stage" command would be a very minor
change. If you're not particularly picky about names, it might be seen
as having no impact at all, (or even slightly negative since "add" and
"stage" could be considered equivalent). If you are picky about names
you might consider it slightly better to "add" when adding a new file
and to "stage" when you want to put some content into the staging area.

OK, so now let me start in with my heresy[*].

To start with I'd like to group the above command into two groups such
that one can be understood without a need to understand the purpose of
the staging area. Note: the goal here is not to lie about the staging
area. It will still be mentioned in the documentation for any command
that needs to mention it, but in a way that a user can easily ignore
those portions at first. So the grouping is:

Without staging
---------------
add
rm
diff
commit -a
commit files...

With staging
------------
stage
diff --staged
commit

So far, that's just a re-grouping. No names or semantics have been
changed.

Change #2: Make a staged commit an explicit act
===============================================
The "-a" stands out to me here as the only command-line option needed
in the first list, and the only command in the second list that
performs a staged operation by default. So change number to is to
redefine "commit" to mean what "commit -a" meant before and to require
a new command-line option for staged committing, (the best naming I
have so far is "commit --staged" with a shortcut of "commit -i"---the
mismatch of "'i' as short for --staged" is a bit unlovely I admit).

Here's what we have after change #2:

Without staging
---------------
add
rm
diff
commit
commit files...

With staging
------------
stage
diff --staged
commit --staged (or "commit -i")

Change #3: Change "add" to not stage any content
================================================
To finish off, I'd like to propose descriptions of the commands to
allow the user to use the "without staging" commands as a complete set
while being able to easily ignore any of the staging capabilities.
This does trigger a need for a semantic change in the "add"
command. Here are the proposed descriptions:

Without staging
---------------
add		Add a file to be managed by git

rm		Remove a file to no longer be managed by git

diff		Show the changes in the working tree compared to the
		latest commit, (or compared to staged content, if any)

commit		Commit the current state of all git-managed files

commit files...	Commit the current state of the specified files

With staging
------------
stage		Shove the current contents of the specified files into
		git's staging area

diff --staged	Show the changes in the staging area compared to the
		latest commit

commit --staged	Commit the state of the current staging area
commit -i

To make the above work, I think Daniel's suggestion of making "add"
put 0{40} into the staging area should work just fine. I know that
Linus has religious objections to these proposed new semantics of "git
add". One response there is to just consider "add" to be a mud-pit
command for people to wallow in that really want it, (like Linus'
proposed "ci" command). If you don't want to be in that mud-pit, then
just use my "stage" command along with "commit -i", (or with "commit"
and some configuration option, or with "commit" and a rejection to my
change #2).

Another response is that these new semantics for "add" really aren't
any worse than other existing things in git, (for example, "git rm"
isn't just updating file content into the index---because it even
leaves the file around by default). [Actually, the fact that "git rm"
doesn't delete the file by default is a bug (and it's my bug). I think
the right thing is that "git rm" should be defined as always deleting
the file from the working tree, and that it should be fixed to fail if
the file if the file is dirty, (unless -f is passed)].

Other examples of the current semantics of git commands being just as
"evil", (I would argue "usable" instead), are below.

I think that here, finally, I've made my proposal as clearly and
consistently as I can. I think the above would only improve git, (by
making it easier to use by new people, while still providing a
consistent model and a way to easily learn everything git has to
offer). Change #2 would be the hardest pill to swallow since it would
mean some change in the habits of existing users, (the other changes
could largely be blissfully ignored by trained git users I
think). This difficulty could be softened with a configuration option
something like core.commitStagedByDefault, or this one change could be
rejected.

-Carl

[*] I say heresy, but I think all the talk about "inconsistency" and
"dishonesty" in the proposals I've been making are really
misplaced. The easiest way to see that is to apply the same arguments
to existing commands in git and see that they are already inconsistent
and dishonest.

Inconsistency
-------------
If the consistent model is "'commit' commits the contents of the
staging area" then what in the world is happening in the case of
"commit files..."? There's really no way to describe that operation in
terms of the staging area, because it simply ignores it. The closest
you could get is to describe the internal implementation in detail:

commit files... Creates a temporary staging area from the latest
		commit, shoves the content of the named files into
		that temporary staging area, creates a new commit from
		that and then does [something] to the original staging
		area.

I (obviously) botched that. Somebody could write an actual, correct
technical description. But you know what? It would be totally
useless. It's really hard to describe what the current command does in
terms of the staging area and nobody would care anyway. It wouldn't
help anybody use the thing. The fact that all commit operations _do_
involve a staging area at some deep point in the implementation is
totally irrelevant to the fact that what "commit files..." does do
_is_ desirable, and is not hard to explain at a conceptual level. What
the current documentation has is:

	"Commit only the files specified on the command line."

This documentation doesn't say _anything_ about the content coming
from the working tree rather than the index. But that's _obviously_
the correct place for the content to come from, and that's what's
implemented.

Dishonesty
----------
The argument here is that some "easier to use" commands lie to the
user, giving them an incorrect idea of what's really happening, and
that this will create barriers to later understanding. I think the
same argument could be applied to say that there's no reason to have
"add", "rm", "resolve", and "update-index" (or "stage"). These
commands are all doing the same thing at a technical level, so why lie
to the user and let the user think they are doing something different?
My reply is that this isn't a lie, but it's providing names for the
user that match the operations that the user is conceptually
doing. That's called "providing a usable interface". If the user goes
on to learn the internals and discovers that these are all wrappers
around some shared core command, then the user can appreciate that
elegance of implementation. But forcing everyone to _use_ one command
for these conceptually separate arguments would be a mistake from the
point-of-view of usability.
Attachment:
pgpSR4qP1Ypa8.pgp

Description: PGP signature