Re: Problem with pack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sat, 26 Aug 2006, Sergio Callegari wrote:
>
> Might the problem have come out of a scenario like the following...
> 
> 1) I use unison to sync my documents (rather than using the git tools...
> silly me!)
> 2) I get things wrong in controlling unison (without realizing that I
> do) and the result is that I lose some blobs.
> 3) I repack an unclean tree (missing some objects)
> 
> Can this be the case?

I do think that your synchronization using unison is _somehow_ part of the 
reason why bad things happened, but I really can't see why it would cause 
problems, and perhaps more importantly, git should have noticed them 
earlier (and, in particular, failed the repack). So a git bug and/or 
misfeature is involved somehow.

One thing that may have happened is that the use of unison somehow 
corrupted an older pack (or you had a disk corruption), and that it was 
missed because the corruption was in a delta of the old pack that was 
silently re-used for the new one.

That would explain how the SHA1 of the pack-file matches - the repack 
would have re-computed the SHA1 properly, but since the source delta 
itself was corrupt, the resulting new pack is corrupt.

If you had used git itself to synchronize the two repositories, that 
corruption of one repo would have been noticed when it transfers the data 
over to the other side, which is one reason why the native git syncing 
tools are so superior to doing a filesystem-level synchronization.

With a filesystem-level sync (unison or anything else - rsync, cp -r, 
etc), a problem introduced in one repository will be copied to another one 
without any sanity checking.

(Which is not to say that the native protocol might not miss something 
too, but it should be _much_ harder to trigger: for anything but the 
initial close, the native protocol will unpack all objects and recompute 
their SHA1 hashes from first principles on the receiving side, rather than 
trust the sender implicitly, so it's fundamentally safer. But maybe we 
could be even _more_ anal somewhere).

So as a suggestion if you want to be careful:

 - only use "git fetch/pull" to synchronize two git repos, because that's 
   inherently safer than any non-native synchronization.

 - if you repack reasonably often, do "git fsck-objects" (which is very 
   cheap when there aren't a lot of unpacked objects) to verify the 
   archive before "git repack -a -d" to repack it.

 - the "fsck-objects" thing won't catch a corrupt pack (unless you ask for 
   it with "--full", which is expensive for bigger projects), but at least 
   with "git fetch/pull", such corruption should not be able to replicate 
   to another repository.

but in the meantime, when you find a place to put the corrupt pack/index 
file, please include me and Junio at a minimum into the group of people 
who you tell where to find it (and/or passwords to access it). I'll 
happily keep your data private (I've done it before for others).

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]