Re: does a successful 'git gc' imply 'git fsck'

Sitaram Chamarty <sitaramc@xxxxxxxxx> · Sun, 2 Dec 2012 14:16:33 +0530

On Sun, Dec 2, 2012 at 9:58 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Sat, Dec 1, 2012 at 6:31 PM, Sitaram Chamarty <sitaramc@xxxxxxxxx> wrote:
>> Background: I have a situation where I have to fix up a few hundred
>> repos in terms of 'git gc' (the auto gc seems to have failed in many
>> cases; they have far more than 6700 loose objects).  I also found some
>> corrupted objects in some cases that prevent the gc from completing.
>>
>> I am running "git gc" followed by "git fsck".  The majority of the
>> repos I have worked through so far appear to be fine, but in the
>> larger repos (upwards of 2-3 GB) the git fsck is taking almost 5 times
>> longer than the 'gc'.
>>
>> If I could assume that a successful 'git gc' means an fsck is not
>> needed, I'd save a lot of time.  Hence my question.
>
> Not really. For example fsck verifies that every blob when
> decompressed and fully inflated matches its SHA-1. gc only checks

OK that makes sense.  After I posted I happened to check using strace
and kinda guessed this from what I saw, but it's nice to have
confirmation.

> connectivity of the commit and tree graph by making sure every object
> was accounted for. But when creating the output pack it only verifies
> a CRC-32 was correct when copying the bits from the source to the
> destination, it does not verify that the data decompresses and matches
> the SHA-1 it should match.
>
> So it depends on what level of check you need to feel safe.

Yup; thanks.

All the repos my internal client manages are mirrored in multiple
places, and they set (or were at least told to set, heh!)
receive.fsckObjects so the lesser check is fine in most cases.

-- 
Sitaram
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html