On Sat, 26 Aug 2006, Sergio Callegari wrote: > > Might the problem have come out of a scenario like the following... > > 1) I use unison to sync my documents (rather than using the git tools... > silly me!) > 2) I get things wrong in controlling unison (without realizing that I > do) and the result is that I lose some blobs. > 3) I repack an unclean tree (missing some objects) > > Can this be the case? I do think that your synchronization using unison is _somehow_ part of the reason why bad things happened, but I really can't see why it would cause problems, and perhaps more importantly, git should have noticed them earlier (and, in particular, failed the repack). So a git bug and/or misfeature is involved somehow. One thing that may have happened is that the use of unison somehow corrupted an older pack (or you had a disk corruption), and that it was missed because the corruption was in a delta of the old pack that was silently re-used for the new one. That would explain how the SHA1 of the pack-file matches - the repack would have re-computed the SHA1 properly, but since the source delta itself was corrupt, the resulting new pack is corrupt. If you had used git itself to synchronize the two repositories, that corruption of one repo would have been noticed when it transfers the data over to the other side, which is one reason why the native git syncing tools are so superior to doing a filesystem-level synchronization. With a filesystem-level sync (unison or anything else - rsync, cp -r, etc), a problem introduced in one repository will be copied to another one without any sanity checking. (Which is not to say that the native protocol might not miss something too, but it should be _much_ harder to trigger: for anything but the initial close, the native protocol will unpack all objects and recompute their SHA1 hashes from first principles on the receiving side, rather than trust the sender implicitly, so it's fundamentally safer. But maybe we could be even _more_ anal somewhere). So as a suggestion if you want to be careful: - only use "git fetch/pull" to synchronize two git repos, because that's inherently safer than any non-native synchronization. - if you repack reasonably often, do "git fsck-objects" (which is very cheap when there aren't a lot of unpacked objects) to verify the archive before "git repack -a -d" to repack it. - the "fsck-objects" thing won't catch a corrupt pack (unless you ask for it with "--full", which is expensive for bigger projects), but at least with "git fetch/pull", such corruption should not be able to replicate to another repository. but in the meantime, when you find a place to put the corrupt pack/index file, please include me and Junio at a minimum into the group of people who you tell where to find it (and/or passwords to access it). I'll happily keep your data private (I've done it before for others). Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html