git bugs

"Ben Lynn" <benlynn@xxxxxxxxx> · Tue, 10 Jun 2008 01:41:34 -0700

I've come across a couple of bugs. Most users will probably never
encounter them, but I think they ought to be fixed. Apologies if
they're well-known issues. I haven't read much of this list.

1. The import/export language poorly handles distinct initial commits
on the same branch, because given two commits with same branch name,
it assumes the latter is the descendant of the former (if there are no
"from" commands).

Normally this is what you want. But if your project, like git, ever
merges distinct initial commits, then all but the first will
unexpectedly gain parents, corrupting all their descendants' hashes.
For example:

$ git clone git://git.kernel.org/pub/scm/git/git.git
$ git checkout -b test 60ea0fdd7de001405fcc7591beb18a66a1f0dd09
$ git fast-export test > /tmp/x
$ cd /some/empty/dir
$ git init
$ git fast-import < /tmp/x
$ git checkout test

Importing a pristine export, we discover Linus did not in fact make
the first commit to the git project:

$ git log d154ebcc23bfcec2ed44e365af9e5c14c6e22015

As a workaround, I have a custom importer that knows that
git-fast-export omits the "from" command in initial commits. But there
should be a command to specify that the current commit is an initial
commit, allowing reliable export of projects such as git.

2. Kudos to everyone who figured out the nasty race condition and its
complex solution as described in Documentation/technical/racy-git.txt
and the comments of read-cache.c. It took me a while to get my head
around it.

Unfortunately, the solution isn't perfect. Try this:

$ echo xyzzy > file
$ git update-index --add file   # don't zero size since contents match
$ echo frotz > file             # all stats still match, contents don't
$ echo nitfol > other  # can be done much earlier
$ git update-index --add other  # now the cached size is zeroed
$ : > file                      # zero the file size muahahaha
$ # Type fast! The above must take place within the same second! ;-)
$ sleep 2
$ echo filfre > other
$ git update-index --add other  # stats of "file" match, hash is wrong

Essentially, we trigger two index writes that avoid operations on
"file": one immediately after "file" is first committed and identified
as racily clean, and the second some time later, after we have
sneakily zeroed the file behind git's back (after previously editing
its contents in place). We defeat the safeguards and end up with a bad
hash in the index that appears valid.

The"other" file merely causes index writes without reading the "file"
entry. It is also racily clean in the above, but this is irrelevant.

It's unlikely this sequence of operations would occur in real usage,
but I'd sleep better if this index corruption bug were eliminated. One
practical but unsatisfactory easy "fix" is to mark racily clean
entries with SIZE_MAX instead of 0. Who uses git to track to files of
this size?

A better solution would be to introduce a new per-entry flag. Let's
call it "dodgy". Then during a cache entry update, we set "dodgy" if
the mtime is bigger or equal to the index timestamp. And during cache
entry reads, we check "dodgy": if clear, then we trust the hatch,
otherwise we don't trust the hash and recompute it, again setting
"dodgy" if necessary (i.e. if the mtime matches the index timestamp
again).

Although this solution does require adding a flag per index entry, we
no longer have to scan through the index on every index write to
perform the size zero hack.

-Ben
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html