Re: [PATCH] make 'git add' a first class user friendly interface to the index

Carl Worth <cworth@xxxxxxxxxx> · Sat, 02 Dec 2006 01:06:57 -0800

On Fri, 01 Dec 2006 23:54:38 -0800, Junio C Hamano wrote:
> > Wow, this index stuff sure takes a lot of explaining. Why are users
> > better off having to grasp all of that stuff before they can
> > successfully add; edit; #oops, add again; and commit their files?
>
> Jumping the index is not about that sequence.  It is about being
> interrupted while doing something else, and committing a smaller
> trivial change first that is independent from what you have been
> doing.  Beginners do not have to do that "interrupted work"
> sequence.

I guess my point is, the only arguments I've heard against changing the
default behavior of "git commit" are:

1. It's always been the way it is

  This is a legitimate concern, yes. It might justify a big bump to
  git's version number, or a new configuration option that the
  old-timers would set, or maybe the "default" I want could be a a new
  configuration option that would be set by default for new clones.

  Whatever. There's an inertia problem here, but that hasn't been the
  strong push-back I've been getting.

2. Doing anything other than the way it is would "deny the index"

  This argument has been made forcefully, and again and again.

  But I don't think it stands at all. The current behavior of
  "git-commit files..." denies the index just as much. Just look at
  the documentation for git-commit. It starts out with a technical,
  index-based description:

	Updates the index file for given paths, or all modified files
	if -a is specified, and makes a commit object.

  Now, that description doesn't explicitly say from _what_ the commit
  object is created, but a natural reading would be "from the updated
  index". And historically, that is exactly what "git commit files..."
  did. I'm sure this wording is fairly old.

  However, today what "git commit files..." does today is very
  different. It's a bit hard to track it down in the man page, but
  eventually you end up with:

	"Commit only the files specified on the command line."

  What does that even mean in terms of the index? I don't even know
  the precise details. And I don't think there's even a very clean way
  to describe it. (The documentation already starts to get a bit messy
  where it has to describe that certain index states will make "git
  commit files..." balk completely).

  So "git commit files..." already "denies the index" just as much as
  my proposed default behavior for "git commit". Why? Because it's
  _useful_, that's why. The old "git commit files..." behavior was
  much more consistent in terms of index manipulation, but Junio got a
  scalding email from Linus when he suggested reverting that behavior.

  If you try to think about all the index manipulations of "git commit
  files..." you'll actually get fairly confused. But has there been
  some problem with people failing to be able to learn the index as a
  result? Has anyone ever even run into this confusion?  No. Because,
  "git commit files..." does exactly what you actually _want_ to do,
  and that operation is really easy to describe without any confusion:

	"Commit only the files specified on the command line."

So, we can come up with just as short descriptions for the other
useful git commands:

	commit -a	Commit all files tracked by git

	commit		Commit all files as they exist in the index

Think about when the behavior of these commands is the same, and think
about when they are different. If they're different, think about what
situations make that difference _useful_, what does the user _want_ to
do? And finally, what did the user have to do to arrive at that
situation?

 * The commands are the same when resolving a merge.

 * The commands are different when explicitly staging a commit. This
   difference is useful---a point which has also been made forcefully,
   again and again. This situation arises when a user explicitly
   executes a command to stage something into the index, (historically
   with "update-index" and now proposed for "add").

 * The commands are different after adding a new file to be tracked by
   git for the first time. This difference is not useful. This
   situation arises whenever a file is added and subsequently
   edited. It's not necessarily the case that the user is _trying_ to
   do any staged commit, (and most commonly the user is not).

The recent "git add" conversation conflates these last two use cases,
which is a bit problematic because one is useful to the user while the
other is not.

> We say "you should add modified state again if you edit it again
> after you added it" in a section before these sentences, and
> encourage users to consistently say 'git add'.

I think this is a mistake for documentation that will be encountered
early by new users, (as git-add is one of the first things a user must
use if starting with git from scratch as opposed to through a
clone). The problem is that all the talk of "git add + git commit"
easily leads to the impression that there's more work to do in git
than in any other system that anyone may have ever encountered.

Now, there isn't actually more work, and we can explain that later,
"you will most commonly not use the sequence explained above, but will
instead use 'git commit -a' which will perform both steps for you".

This is the kind of sentence in documentation which just screams that
there's a user-interface problem. Why do we explain how to do
something only to say a moment later that user's won't do that? The
reason is because we _have_ to explain git-add that way or else the
current semantics of git-add + git-commit can be very confusing.

Let's just eliminate that confusion, drop the stuff from the
docs. that make git seem like it's harder to use than anything else on
the planet, and save the discussion of the index for a section in the
documentation that deals with something the user is wanting to do that
actually _benefits_ from the index.

> By the way, aren't people disturbed that "git rm" does not
> default to "-f" -- I rarely use the command myself but that
> makes it feel even more awkward that "git rm foo" does not
> remove the file "foo".

Yes, it's usually a bug that it doesn't delete the file by
default. This one's my doing, but I was thinking of an actual
situation that I had been in, that of wanting to undo an "add". For
example:

	git add file1
	git add file2
	# Oh, wait, I should commit these independently
	git rm file2
	git commit -m "add file1"
	git add file2
	git commit -m "add file2"

So one way to fix this would be to make "git rm" delete the file if it
is consistent in working-tree and HEAD and to leave it there
otherwise. The message could be something like:

	Note: file <foo> has uncommitted changes, leaving it in
	the working tree as an untracked file.

Then, -f could still be useful as a way to force file deletion even in
this case.

> Well, I think at least we are converging.

I'm glad you feel that way. I know I've been something of a pest
recently, (and yes, Linus, I do often get weary of pests that want to
throw out the fundamental strengths of a system like X).

Maybe think of it this way: I've been arguing on behalf of
brain-damaged users. Git's got the cure for them, but they're not
ready to sign up for that kind of brain surgery when they can see it
coming. If we can subdue them with a more gentle introduction, ("start
counting the everyday git commands backwards from 10 to 1"), then
we'll have their brains and can do everything we want to them.

And I really think the re-training can be painless---I don't think the
proposals I'm making will setup any nasty surprises down the road.

-Carl
Attachment:
pgpELMVd5YDl2.pgp

Description: PGP signature