Re: Problem with pack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 27 Aug 2006, Linus Torvalds wrote:

> Now, "git repack -a" is obviously special in that the "-a" will mean that 
> we will generally touch all of the old pack _anyway_, and thus verifying 
> the signature is no longer at all as unreasonable as it is under other 
> circumstances. And very arguably, if you _also_ do "-d", then since that 
> is a fairly dangerous operation with the potential for real data loss, you 
> could well argue that we should do it.

We definitely should do it with -d.

> However, since the data was _already_ corrupt in that situation, and since 
> a "git-fsck-objects --full" _will_ pick up the corruption in that case 
> both before and after, equally arguably it's also true that there's really 
> not a huge advantage to checking it in "git repack -a -d".

There really is an advantage.  Given that the absence of -d leaves old 
objects/packs around, there is a greater chance for still finding an 
early copy of the bad object.

> In other words, in your case, the reason you ended up with the corruption 
> spreading was _not_ that "git repack -a -d" might have silently not 
> noticed it, but really the fact that unison meant that the corruption 
> would spread from one archive to another in the first place.

But -d would definitely delete old packs that could have had a non 
corrupted copy of the desired object.

> Final note: a "git repack -a -d" normally actually _does_ do almost as 
> much checking as a "git-fsck-objects". It's literally just the "copy the 
> already packed object from an old pack to a new one" that it an 
> optimization that short-circuits all the normal git sanity checks. All the 
> other cases will effectively do a lot of integrity checking just by virtue 
> of unpacking the data in the object that is packed, before re-packing it.
> 
> So it might well be the case that we should simply add an extra integrity 
> check to the raw data copy in builtin-pack-objects.c: write_object().
> 
> Now, these days there is actually two cases of that: the pack-to-pack copy 
> (which has existed for a long while) and the new "!legacy_loose_object()" 
> case. They should perhaps both verify the integrity of what they copy.

I think that git-pack-object should grow another flag: --verify-src or 
something.  That flag would force the verification of the pack checksum 
for any pack used for the repack operation.  Then it should be used 
anytime -d is provided to git-repack.  When -d is not provided then we 
can skip that --verify-src security measure since nothing gets deleted 
and therefore no risk of loosing a good object over a corrupted one 
would happen.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]