[RFC] Two conceptually distinct commit commands

Carl Worth <cworth@xxxxxxxxxx> · Mon, 04 Dec 2006 11:08:22 -0800

[
  I think the proposal below is original, and more correctly captures
  the essence of the "commit interface wart" than any previous
  proposal I've made. This proposal is also based entirely on what is
  useful for all git users, and what I perceive git's conceptual
  models to be. That is, this proposal concerns what _I_, (as a fairly
  experienced git user), actually want, without any bias for any
  assumptions about what an imagined "new user" might want. Notably,
  it does not try to satisfy naive (and likely incorrect) assumptions
  about git's model.

  Finally, this proposal intentionally uses ludicrously long command
  names. This is because a discussion of realistically short names
  triggers the two loaded issues of "muscle memory" and which concepts
  get blessed as "defaults". In previous threads, those issues have
  muddied the conceptual issues I'd like to focus on here. Let's talk
  about the concepts first, and save discussions of naming for later
  if necessary.
]

Proposal
-------
Here are the two commit commands I would like to see in git:

  commit-index-content [paths...]

    Commits the content of the index for the given paths, (or all
    paths in the index). The index content can be manipulated with
    "git add", "git rm", "git mv", and "git update-index".

  commit-working-tree-content [paths...]

    Commits the content of the working tree for the given paths, (or
    all tracked paths). Untracked files can be committed for the first
    time by specifying their names on the command-line or by using
    "git add" to add them just prior to the commit. Any rename or
    removal of a tracked file will be detected and committed
    automatically.

Rationale summary
-----------------
These two commands capture a distinct conceptual split that is useful
for what users want to do with git. The split is necessary and
sufficient to provide access to four different useful pieces of commit
machinery. This is more functionality than in current git, and is
provided with more clarity.

The semantics of the two commands above are distinct enough that any
given tutorial introduction to git could outline a complete work-flow
by using only one or the other of the two commands, (or by presenting
one first and then expanding to the other).

The conceptual split here is necessary. In general, neither of the two
commands can be defined in terms of the other. This is independent of
the fact that commit-index-content is more core and provides shared
machinery for commit-working-tree-content. It is also independent of
the fact that commit-working-tree-content _can_ be defined in terms of
commit-index-content in the special case of the "all tracked paths"
form.

The two-way split here is also sufficient. It provides access to four
different, and useful, pieces of commit machinery. Of the four, only
three of these pieces currently exist in git. The new behavior is that
of "commit-index-content paths..."  and is actually quite useful as
described in the detailed rationale below.

Finally, the two-way split here is simpler and more clear than the
three different commit commands currently provided by git, ("commit",
"commit paths...", and "commit -a"). The improved clarity comes from
taking advantage of the following standard command-line convention:

	If optional arguments are omitted from a command, the command
	is semantically equivalent to some default argument being
	provided.

This convention is standard across many unix commands and is prevalent
in git itself, (such as commands like git-log defaulting to HEAD when
no revision specifier is provided). Note that this convention is not
followed by the current git-commit. The behavior of "git commit" and
"git commit paths..." involve distinct semantics. It is not the case
that "git commit" is equivalent to "git commit paths..." with some
default argument supplied. Violating this command-line convention is
unkind in general, but it also steals "space" from the command-line
for implementing the semantics of "git commit" with the application of
a <paths...> limit. This is discussed in more detail below.

So, by cleanly separating the two different useful git-commit
behaviors, and applying a standard command-line convention, we end up
with more functionality and less to teach. What's not to love? All
that would be missing is to come up with names for the two
commands. As I promised above, I'm going to avoid proposing any
binding of the concepts to realistic names here, but I will point out
that one of the "names" might very well be a command-line option
alteration of the other command.

Rationale details
-----------------
Although the conceptual split is only two commands, the actual
implementation of this functionality breaks down into four separate
internal behaviors, (based on whether doing "given paths" or "all
tracked paths"). Three of the four exist in git already, while the
fourth is new, (and also useful). Let's review each of the four along
with the names that git currently provides for them:

1. commit-index-content		# all paths in the index

    This functionality currently exists as "git commit" and is the
    oldest and definitely the "most core" git commit command. Until
    fairly recently, all other git commit commands could easily be
    described as a variation of this functionality.

2. commit-index-content paths...

    This functionality does not currently exist in any git commit
    command, as far as I know. The behavior is to commit only a
    (path-based) subset of the content that has been staged into the
    index.

    I was originally just going to say that this functionality "might
    be useful in some cases", but coincidentally Alan Chandler
    happened to request it just yesterday on the list:

	I have been editing a set of files to make a commit, and after editing each
	one had done a git update-index.

	At this point I am just about to commit when I realise that one of the files
	has changes in it that really ought to be a separate commit.

	So effectively, I want to do one of three things

	a) git-commit <that-file>

    It's interesting to note that either of the two solutions
    suggested in response to Alan might not work in general. For
    example, "git reset", would not be a satisfactory solution if the
    user had dirty content in any of the affected files compared to
    what was staged in the index. Similarly, just removing the
    safety-valve on the existing "git commit <that-file>" would commit
    the wrong content if the working-tree contents of <that-file> were
    dirty with respect to the index.

    Now, it might still sound far-fetched to imagine wanting to commit
    a subset of something staged in the index while also having dirty
    content, but it occurs to me that I would actually _love_ to have
    this capability. The case I would use it for is fairly common,
    (and something that I think will speak to Junio who often brings
    up a similar scenario).

    Here's where I would like this functionality:

	I receive a patch while I'm in the middle of doing other work,
	(but with a clean index compared to HEAD, which is what I've
	usually). The patch looks good, so I want to commit it right
	away, but I do want to separate it into two or more pieces,
	(commonly this is because I want to separate the "add a test
	case demonstrating a bug" part from the "fix the bug"
	part). So, if I could do:

	git apply --index
	git commit-index-content <files that add the test case>
	git commit-index-content

	Then this would do exactly what I want. I wouldn't even have
	to think about whether my local modifications are to any of
	the same paths as touched by the patch.

    Today, in this scenario, what I have to do is to create a
    temporary branch with a clean working tree, and then use the index
    to stage the commit there. That process involves a few annoyances,
    (stashing my dirty work, inventing a free name for the temporary
    branch (which usually involves "git branch -D tmp"), switching back
    when I'm done, and trying to remember to clean up the branch). The
    new capability would let me skip _all_ of that overhead and
    instead I could just delight in the beauty and power of the
    index. Woo-hoo!

3. commit-working-tree-content		# all tracked files

    This functionality currently exists as "git commit -a" and, while
    not _really_ old in git's history, its invention predates my
    initial exposure to git. It has almost always been described in
    terms of its implementation, ("first update the index for all
    paths in the index, then commit that index").

    One benefit of this description is that it forces the user to
    learn about the index up front, (and gain a better understanding
    of git's model). One cost is that the user is forced to learn a
    two-stage implementation for a single-step process, (commit my
    changes). I won't try to weigh the costs/benefits here, but
    compare this to the description in (4) below.

4. commit-working-tree-content paths...

    This functionality currently exists as "git commit paths..." and
    is the newest variant of any git-commit command described here.

    I think the evolution of what the semantics of the "git commit
    paths..." command-line has been is very instructive. There was a
    time when this command could be described in terms of a two-stage
    manipulation of "the" index just like "commit -a" is described
    today. That is:

	Old: first update the index for all specified paths, then
	     commit the index".

    But then the semantics were changed and the new description does
    not involve the index at all:

	New: Commit only the files specified on the command line.

    The old behavior is still available with the --include option, but
    nobody has ever come out in favor of that being a useful command,
    (I agree it is not useful at all). Meanwhile, the new (default)
    behavior as been strongly identified by Linus as extremely
    useful. Junio has recently noticed that the old --index behavior
    is more conceptually consistent with the classic, commit-the-index
    definition of the core "git commit", but that's not sufficient
    justification for promoting functionality that would never be
    useful.

    So the evolution of the current "commit paths..." shows utility of
    functionality being a primary concern in defining the semantics of
    git commands. And that's wonderful.

In my opinion, what has happened with the evolution of "commit paths"
and "commit -a" is that a new conceptual commit behavior has been
invented, (what I've termed commit-working-tree-content), but it
hasn't been recognized yet as separate from the core
commit-index-content nature of "git commit". And there's some muddling
in that simply adding a <paths..> argument to "git commit" completely
changes its semantics, (which violates the command-line convention I
described above).

-Carl
Attachment:
pgpuGguaY20ju.pgp

Description: PGP signature