Re: git fetch --prune fails with "fatal: bad object"

Jeff King <peff@xxxxxxxx> · Tue, 4 Jun 2024 06:44:37 -0400

On Sat, Jun 01, 2024 at 08:53:43AM -0700, Junio C Hamano wrote:

> Interesting.  "git fsck" certainly can be used to help you find out
> about them.  In a throw-away repository, after manually crafting
> some "broken refs" (because update-ref will refuse to create a ref
> pointing at a missing object):
> 
>     $ git for-each-ref
>     9e830ad6c4f43159cef50cb1c2205f513c79bc8b commit refs/heads/master
>     $ echo 9e830ad6c4f43159cef50cb1c2205f513c79bc8a >.git/refs/heads/broken-missing
>     $ git rev-parse master: >.git/refs/heads/broken-tree
>     $ git rev-parse "master:foo /baz" >.git/refs/heads/broken-blob
> 
> running "git fsck" does tell you about them, ...
> 
>     $ git fsck
>     Checking object directories: 100% (256/256), done.
>     error: refs/heads/broken-blob: not a commit
>     error: refs/heads/broken-missing: invalid sha1 pointer 9e830ad6c4f43159cef50cb1c2205f513c79bc8a
>     error: refs/heads/broken-tree: not a commit
> 
> ... and using the information, you can
> 
>     $ for r in refs/heads/broken-{blob,missing,tree}
>       do git update-ref -d "$r"
>       done
> 
> to unbreak the repository.

These are good examples. I was going to suggest fsck, as well, just
because I knew it would keep going after seeing bogus results. But more
interesting is that it is finding things in your example that other
programs would _not_ find, because it's being more thorough than just
reading the refs.

Having to manually convert the human-readable fsck output to a cleanup
command is a minor pain. We could provide "git fsck --prune-broken-refs"
or something, but I'm hesitant. Deleting refs in a corrupted repository
is a good way to make recovery even harder, as it opens the door to
removing whatever objects we do still have.

In the case of a refs/remotes entry where you happen to know that you
could re-clone from the other side, it is relatively low stakes. But I
think keeping a human brain in the loop between corruption and deletion
is a good thing. Corruption should not be happening so often that it's a
major pain point.

-Peff