Re: Best way to check for a "dirty" working tree?

Jonathan Nieder <jrnieder@xxxxxxxxx> · Mon, 13 Jun 2011 17:22:48 -0500

Hi Dirk,

Dirk SÃsserott wrote:

> I have a script which moves data from somewhere to my local repo and
> then checks it in, like so:
>
> -----------
> mv /tmp/foo.bar .
> git commit -am "Updated foo.bar at $timestamp"
> -----------
>
> However, before overwriting "foo.bar" in my working directory, I'd like
> to check whether my working tree is dirty (at least "foo.bar").

Interesting example.  Sensible, as long as you limit the commit to
foo.bar (i.e., "git commit -m ... --only foo.bar")!

> I tried
>
> A) if ! git diff-index --quiet HEAD -- foo.bar; then
>        dirty=1
>    fi

To piggy-back on what Ram wrote, this is a question about the
difference between porcelain (high-level) and plumbing (low-level)
commands.

Generally speaking, plumbing is meant to give more stable behavior for
scripts, in two ways:

 - On one hand we make a concerted effort to keep the command-line
   usage and output of plumbing stable.  By contrast, porcelain will
   change over time as we learn about the way people work.

 - On the other hand plumbing is designed to produce simple, reliable,
   and machine-friendly behavior.  For example, while "git checkout"
   will guess what the caller is trying to do based on whether its
   first argument is a branch name or a file, "git checkout-index"
   only accepts pathspecs.  Plumbing tends to produce parseable
   output and not to automatically spawn a pager when its output is
   going to the terminal or to change behavior based on configuration.

Now, a word of warning.  One aspect of this "do not second-guess the
caller" behavior is that low-level commands like "git diff-index"
blindly trust stat() information in the index, rather than going to
re-read a seemingly modified file and updating the index if the
content is not changed.  You can see this by running "touch foo.bar";
"git diff-index" will report the file as changed, until you use "git
update-index" to refresh the stat information:

	git update-index --refresh --unmerged -q >/dev/null || :
	if ! git diff-index --quiet HEAD -- foo.bar; then
		dirty=1
	fi

Alas, this doesn't seem to be documented anywhere (except for the
gitcore-tutorial(7))!  It ought to be.

> Both A) and B) work. But which one is better/faster/more reliable?

I suspect the fastest (by virtue of saving a fork + exec and not
having to stat files twice, once for update-index and again for
diff-index) is

	git -c diff.autorefreshindex=true diff --quiet -- foo.bar

by a sad accident of history --- the "opportunistic index refresh"
behavior it implements does not seem to be exposed as plumbing.
If you are going to be performing such operations in a loop, then

	git update-index --refresh --unmerged -q >/dev/null || :
	for i in loop
	do
		... actions like diff-index that trust the index ...
	done

will be faster.  And the latter is plumbing, with all the niceties
that entails, so if I were in your shoes I'd use the latter.

Hope that helps,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html