Re: Problem with pack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sun, 27 Aug 2006, Sergio Callegari wrote:
>
> There is something that I still do not understand... (sorry if I ask stupid
> questions)...
> Since packs have an sha signature too, if there was a data corruption (disk or
> transfer), shouldn't that have been detected at the repack? I.e. doesn't
> repack -d verify the available data before cancelling anything?

The packs do have a SHA1 signature, but verifying it is too expensive for 
normal operations. It's only verified when you explicitly ask for it, ie 
by git-fsck-objects and git-verify-pack.

Now, for small projects we could easily verify the SHA1 csum when we load 
the pack, but imagine doing the same thing when the pack is half a 
gigabyte in size, and I think you see the problem.. Especially as most 
normal operations wouldn't even otherwise touch more than a small small 
fraction of the pack contents, so verifying the SHA1 would be relatively 
very expensive indeed.

Now, "git repack -a" is obviously special in that the "-a" will mean that 
we will generally touch all of the old pack _anyway_, and thus verifying 
the signature is no longer at all as unreasonable as it is under other 
circumstances. And very arguably, if you _also_ do "-d", then since that 
is a fairly dangerous operation with the potential for real data loss, you 
could well argue that we should do it.

However, since the data was _already_ corrupt in that situation, and since 
a "git-fsck-objects --full" _will_ pick up the corruption in that case 
both before and after, equally arguably it's also true that there's really 
not a huge advantage to checking it in "git repack -a -d".

In other words, in your case, the reason you ended up with the corruption 
spreading was _not_ that "git repack -a -d" might have silently not 
noticed it, but really the fact that unison meant that the corruption 
would spread from one archive to another in the first place.

NOTE! This is all assuming my theory that a packed entry was broken in the 
first place was correct. We obviously still don't _know_ what the problem 
was. So far it's just a theory.

Final note: a "git repack -a -d" normally actually _does_ do almost as 
much checking as a "git-fsck-objects". It's literally just the "copy the 
already packed object from an old pack to a new one" that it an 
optimization that short-circuits all the normal git sanity checks. All the 
other cases will effectively do a lot of integrity checking just by virtue 
of unpacking the data in the object that is packed, before re-packing it.

So it might well be the case that we should simply add an extra integrity 
check to the raw data copy in builtin-pack-objects.c: write_object().

Now, these days there is actually two cases of that: the pack-to-pack copy 
(which has existed for a long while) and the new "!legacy_loose_object()" 
case. They should perhaps both verify the integrity of what they copy.

Junio, comments?

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]