On Sun, 27 Aug 2006, Sergio Callegari wrote: > > There is something that I still do not understand... (sorry if I ask stupid > questions)... > Since packs have an sha signature too, if there was a data corruption (disk or > transfer), shouldn't that have been detected at the repack? I.e. doesn't > repack -d verify the available data before cancelling anything? The packs do have a SHA1 signature, but verifying it is too expensive for normal operations. It's only verified when you explicitly ask for it, ie by git-fsck-objects and git-verify-pack. Now, for small projects we could easily verify the SHA1 csum when we load the pack, but imagine doing the same thing when the pack is half a gigabyte in size, and I think you see the problem.. Especially as most normal operations wouldn't even otherwise touch more than a small small fraction of the pack contents, so verifying the SHA1 would be relatively very expensive indeed. Now, "git repack -a" is obviously special in that the "-a" will mean that we will generally touch all of the old pack _anyway_, and thus verifying the signature is no longer at all as unreasonable as it is under other circumstances. And very arguably, if you _also_ do "-d", then since that is a fairly dangerous operation with the potential for real data loss, you could well argue that we should do it. However, since the data was _already_ corrupt in that situation, and since a "git-fsck-objects --full" _will_ pick up the corruption in that case both before and after, equally arguably it's also true that there's really not a huge advantage to checking it in "git repack -a -d". In other words, in your case, the reason you ended up with the corruption spreading was _not_ that "git repack -a -d" might have silently not noticed it, but really the fact that unison meant that the corruption would spread from one archive to another in the first place. NOTE! This is all assuming my theory that a packed entry was broken in the first place was correct. We obviously still don't _know_ what the problem was. So far it's just a theory. Final note: a "git repack -a -d" normally actually _does_ do almost as much checking as a "git-fsck-objects". It's literally just the "copy the already packed object from an old pack to a new one" that it an optimization that short-circuits all the normal git sanity checks. All the other cases will effectively do a lot of integrity checking just by virtue of unpacking the data in the object that is packed, before re-packing it. So it might well be the case that we should simply add an extra integrity check to the raw data copy in builtin-pack-objects.c: write_object(). Now, these days there is actually two cases of that: the pack-to-pack copy (which has existed for a long while) and the new "!legacy_loose_object()" case. They should perhaps both verify the integrity of what they copy. Junio, comments? Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html