Re: Suggestion: "verify/repair" option for 'git gc'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 14 2021, Alexandr Miloslavskiy wrote:

> On 14.10.2021 4:19, Ævar Arnfjörð Bjarmason wrote:
>> I'd be interested in a copy of it, I've been slowly trying to improve
>> these sorts of corruption cases.
>
> Sent.

Thanks, I can't promise I'll take a look at it in detail time soon, but
I was going to loop back to checking out these corruption cases at some
point.

>> I wonder if this and other issues you encountered wouldn't need a full
>> "fsck", but merely gc triggering a complete repack.
>
> That sounds slow :( For example, it's going to be a lot of disk write
> bandwidth. Just doing the verification along with regular gc sounds
> faster.

Having looked at your repo the immedite issue is that you've got a tree
in it that has a (manually crafted?) entry that points to a commit
object, without the relevant mode being correct:

    $ git cat-file -p 1d571d7354f99b726bbcc0cb232b3f47846c71a1
    100644 blob 0189425cc210555c36383293c468df5da73acc48    common.mak
    040000 tree 6a2c4a5ef0b0ee7aa85d88c3147b7558a6a7c29f    include

Was this created with git itself, or some tool that's manually crafting
trees? I.e. that the object on-disk has the exact expected content but
is just bad in this particular way points to corruption in git or
another tool writing the data, not e.g. FS corruption or a bit-flip.

Anyway, getting back on track the "gc" command actually does do exactly
what you're suggesting:

    $ git gc; echo $?
    error: object 0189425cc210555c36383293c468df5da73acc48 is a commit, not a blob
    fatal: entry 'common.mak' in tree 1d571d7354f99b726bbcc0cb232b3f47846c71a1 has blob mode, but is not a blob
    fatal: failed to run repack
    128

The problem is that as a user you won't have seen that because we won't
get that far without running into the gc.auto limit, then it would have
run into that, and you'd have had the contents of gc.log spewed at you
by other commands.

So maybe we should be more aggressive there, e.g. as a function of repo
size or whatever (this repo is 18MB).

You really don't need "git fsck" to verify FS corruption or basic object
graph issues like these, and I think it's rather unfortunate that we
expose it like that.

What it does over and beyond a full repack of the repo is to
exhaustively verify object contents, which is most useful e.g. if you're
running a git service and want to prevent users from pushing crafted
corrupt objects, either intentionally or unintentionally.

I've also been meaning to look at that aspect of it for a while, i.e. it
should be able to have some --fast, it has --connectivity-only, but that
one goes a bit too far, although in this case it would have helped you.

>> Yes, we still definitely have cases where dealing with this sort of
>> thing can be very painful.
>
> With the new remote promisor code, I think that auto-fixing corrupted
> blobs is easy enough (provided they can be found on any remote) ?

Hypothetically, but these blobs aren't corrupted, and no amount of
fetching something from other places is going to fix a bad DAG. If that
thing didn't really point at the wrong object type the hash would be
different. The problem is that it was wrong when it was written.

I say "hypothetically" because even in the case of a bitflip or whatever
coercing git into some sort of auto-repair mode is pretty far off. I've
been able to do it manually in some cases, but e.g. promisors not having
a blob *and knowing that* is very different from the more general cases
of object(s) XYZ being corrupt.

We have well-intentioned features like the collision detection that
actively gets in the way of some repairs like that (I had a patch[1] to
disable it, which as a side-effect made recovering from some forms of
corruption easier).

But even without that you'll find that e.g. if a recent object is bad,
and we'd like to fetch it from upstream, that we're just going to die
pretty early, as none of the code involved in say incremental fetching
is prepared to run across a bad/corrupt object.

Those aren't inherent problems, and it would be very nice to have more
such auto-repair in git, just limitations of the current implementation.

1. https://lore.kernel.org/git/20181028225023.26427-5-avarab@xxxxxxxxx/ 




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux