Re: Separating "add path to index" from "update content in index"

Carl Worth <cworth@xxxxxxxxxx> · Fri, 22 Dec 2006 13:57:19 -0800

On Fri, 22 Dec 2006 00:06:32 -0500 (EST), Nicolas Pitre wrote:
> On Thu, 21 Dec 2006, Carl Worth wrote:
>
> > So, I think what I really want here is a complete separation in the
> > interface between adding a path to the index and updating content into
> > the index.
>
> Strangely enough I think this separation is unnecessary and redundent.

One argument I would make in favor of the separation is that the two
operations are conceptually distinct from the user point-of-view. But
that's really hard to nail down since all users have different points
of view and different conceptual models, (though I think the recent
post about similar file names and accidentally adding a file meant to
be untracked is evidence in favor of this argument).

There's a much less fuzzy, and strictly technical argument that can be
made. Right now, we document "git add" as being useful for two
purposes, ("adding new files" and adding "modified files...to the set
of changes"). These two operations can be described as:

	1. Add new path to the index, and update the content

	2. Update the content for an existing path

The technical argument for separating the notions of "add path" and
"update content" comes from looking at how to specify path names to
these operations, (and recursive names in particular).

By definition, the first operation, ("add new paths"), must accept
path names from the working tree as it exists in the filesystem.
Since without this operation there's no way to get paths into the
index in the first place. So, any recursive operation for this
operation should traverse the tree of files as they exist in the file
system. This is quite useful in the case of creating a new directory
in a project and wanting to add all of the files in that directory:

	git add new-directory

However, the second operation ("update content") need not be defined
in terms of the tree in the filesystem, and is in fact quite useful
when it operates on the tree that exists "within" git. For example, if
I have been hacking on a feature change in a 'source' directory that's
not quite finished yet, and also a new test for that feature in a
'test' directory, (that I do consider ready), then it would be
convenient to be able to stage all the content in that directory be
similarly specifying just the directory name. For example, I'd like to
be able to do:

	git update-index test

(which doesn't actually work right now). But doing "git add test"
would be wrong, since it would also add any untracked files, and I
don't want that. This "update content for known files" operation
should recurse on the tree that git knows about, and not the tree of
files in my filesystem.

> > We've long had a command that updates content to the index, and it
> > takes a command-line option (--add) to allow it to first do the
> > necessary path addition as well.
>
> And it is still there.

Yes, update-index still exists. But we're relegating that to
plumbing. What I'm proposing is that we should have a porcelain
command that just does the "update content for known files part" and
that merging this with something that makes files "become known" to
git for the first time is a mistake.

> The problem lies with the git-diff interface then, not git-add.

I don't think so. I'm quite convinced that the fact that "git diff"
shows the difference from the index to the working tree is correct and
can't really be changed. The issue I'm talking about here is that for
"tracked" files git currently provides a way for me to have my edits
of I can

> > I think the best would be:
> >
> > 	git update-index --all
> >
> > which would still allow room for:
> >
> > 	git add --all
...
> There is no consistency needed between git-add and git-update-index.
> The first is for users while the second is more suited for scripting
> your own interface.

But it's not actually update-index that I want. I agree that it's a
plumbing thing that users shouldn't use. What I want is two different
pieces of porcelain here, each focusing one one simple task. One to
add the path to the index, and one to update content in the index for
a path that exists.

> > 	git refresh --add
> > or:
> > 	git add --refresh
> >
> > would provide the behavior that currently is provided by "git add",

That was actually a bad idea, and I'll retract that part. Neither of
these options should exist. We already have an all-singing,
all-dancing git-update-index that can do anything we want. We really
don't need two new pieces of porcelain that also do everything
update-index does but just have different defaults.

Much better would be for "git add" and "git refresh" to each just
stick to a single task and to do it well, (git has UNIX philosophy,
right?). So "git add" should just add paths to the index, "git
refresh" should just update content for existing paths in the index,
and we don't need a lot of options for either command for users to
have to wade through.

With those simple commands, we could have nice, separate behavior for:

	git add some-dir
and:
	git refresh some-dir

and if someone wants the existing "add path and update content"
behavior of git add then it should be a simple matter of aliasing to
the combination of "git add" followed by "git refresh".

> I think you are trying to solve the wrong problem, or at least solve a
> problem the wrong way.  The problem is that git-diff doesn't give you
> the output you expect because of the index interfering in your work
> flow.  And I understand that.

I don't think that's the right characterization. I like that "git
diff" works from the index, and I take advantage of that by
intentionally putting content into the index. The problem is that "git
add" currently forces content into the index even if I consciously do
not want it there yet. I just want a way to tell "git diff", (and
commit -a), to start looking at new files, but without staging the
current content of those files into the index.

> But the best solution is really for git-diff to have a mode where you
> could display a diff between the work tree and the index, _or_ the index
> and HEAD, for each file listed in the index while giving priority to the
> former.

I don't understand what you are proposing here. What would this mode
display? How would it decide?

> With this, for users acustomed to "commit -a", the natural and pretty
> consistent way to see a diff for such a commit before actually
> performing it would bi "diff -a".  Isn't it logical?

A new option ("git diff -a") doesn't help much. There's already "git
diff HEAD" and I understand what it does. The problem is having "git
diff" usually work, and then having to remember to do something else
when it doesn't do the right thing.

[Though a command-line option would have one advantage over HEAD which
 is that it's easier to document command-line options than a magic
 name like HEAD. This "hard to document" bug is something that affects
 all of the magic names, (HEAD, ORIG_HEAD, MERGE_HEAD, etc.), and
 keeps their functionality quite hidden from new users of git.]

-Carl
Attachment:
pgpL6eo5EQgTF.pgp

Description: PGP signature