I've come across a couple of bugs. Most users will probably never encounter them, but I think they ought to be fixed. Apologies if they're well-known issues. I haven't read much of this list. 1. The import/export language poorly handles distinct initial commits on the same branch, because given two commits with same branch name, it assumes the latter is the descendant of the former (if there are no "from" commands). Normally this is what you want. But if your project, like git, ever merges distinct initial commits, then all but the first will unexpectedly gain parents, corrupting all their descendants' hashes. For example: $ git clone git://git.kernel.org/pub/scm/git/git.git $ git checkout -b test 60ea0fdd7de001405fcc7591beb18a66a1f0dd09 $ git fast-export test > /tmp/x $ cd /some/empty/dir $ git init $ git fast-import < /tmp/x $ git checkout test Importing a pristine export, we discover Linus did not in fact make the first commit to the git project: $ git log d154ebcc23bfcec2ed44e365af9e5c14c6e22015 As a workaround, I have a custom importer that knows that git-fast-export omits the "from" command in initial commits. But there should be a command to specify that the current commit is an initial commit, allowing reliable export of projects such as git. 2. Kudos to everyone who figured out the nasty race condition and its complex solution as described in Documentation/technical/racy-git.txt and the comments of read-cache.c. It took me a while to get my head around it. Unfortunately, the solution isn't perfect. Try this: $ echo xyzzy > file $ git update-index --add file # don't zero size since contents match $ echo frotz > file # all stats still match, contents don't $ echo nitfol > other # can be done much earlier $ git update-index --add other # now the cached size is zeroed $ : > file # zero the file size muahahaha $ # Type fast! The above must take place within the same second! ;-) $ sleep 2 $ echo filfre > other $ git update-index --add other # stats of "file" match, hash is wrong Essentially, we trigger two index writes that avoid operations on "file": one immediately after "file" is first committed and identified as racily clean, and the second some time later, after we have sneakily zeroed the file behind git's back (after previously editing its contents in place). We defeat the safeguards and end up with a bad hash in the index that appears valid. The"other" file merely causes index writes without reading the "file" entry. It is also racily clean in the above, but this is irrelevant. It's unlikely this sequence of operations would occur in real usage, but I'd sleep better if this index corruption bug were eliminated. One practical but unsatisfactory easy "fix" is to mark racily clean entries with SIZE_MAX instead of 0. Who uses git to track to files of this size? A better solution would be to introduce a new per-entry flag. Let's call it "dodgy". Then during a cache entry update, we set "dodgy" if the mtime is bigger or equal to the index timestamp. And during cache entry reads, we check "dodgy": if clear, then we trust the hatch, otherwise we don't trust the hash and recompute it, again setting "dodgy" if necessary (i.e. if the mtime matches the index timestamp again). Although this solution does require adding a flag per index entry, we no longer have to scan through the index on every index write to perform the size zero hack. -Ben -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html