Re: git fsck does not check the packed-refs file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 18, 2024 at 09:02:30AM +0100, R. Diez wrote:
> Hi all:
> 
> I have been hit by an unfortunate system problem, and as a result, a
> few files in my Git repository got corrupted on my last git push. Some
> random blocks of bytes were overwritten with binary zeros, so I
> started getting weird unpacking errors etc.
> 
> It took a while to realise what the problem was. During my
> investigation, I ran "git fsck", which reported no problems, and then
> "git push" failed.
> 
> One of the very few corrupted files was packed-refs. This is a text
> file, so it was easy to compare it and see the corrupting binary
> zeros. But that made me wonder what "git fsck" checks.

Can you maybe expand a bit on how you arrived at this bug? Was this a
hard crash of the system that corrupted the repository or rather
something like actual disk corruption?

I'm mostly asking because I have been fixing some sources of refdb
corruption:

  - bc22d845c4 (core.fsync: new option to harden references, 2022-03-11)
    started to fsync loose refs to disk before renaming them into place,
    released with Git v2.36.

  - ce54672f9b (refs: fix corruption by not correctly syncing
    packed-refs to disk, 2022-12-20) started to sync packed-refs to disk
    before renaming them into place, released with Git v2.40 and
    backported to Git v2.39.3.

So if:

  - you use a journaling filesystem,

  - you didn't disable `core.fsync`,

  - you use Git v2.40 or newer,

then you should in theory not run into any refdb corruption anymore. At
least we didn't experience corruption anymore at GitLab.com, whereas
before we encountered corruption every so often.

> I am guessing that "git fsck" does not check file packed-refs at all.
> I mean, it does not even attempt to parse it, in order to check
> whether at least the format makes any sense. Only "git push" does it.

Indeed it doesn't. While the issue is comparatively easy to spot by
manually inspecting the `packed-refs` file, I agree that it would be
great if git-fsck(1) knew how to check the refdb for consistency. This
problem is only going to get worse once the upcoming reftable backend
lands -- it is a binary format, and just opening it with a text editor
to check whether it looks sane-ish stops being a viable option here.

In fact, I already planned to introduce such consistency checks for the
refdb soonish. Once the reftable backend is upstream I will focus more
on additional tooling to support it, and extending our consistency
checks is one of the first items on my todo list here.

> What other parts of the repository does "git fsck" not check then?

There may be some metadata and cache-like data structures that we don't
check, but the object database is checked by default.

> The repository check is suspiciously fast. Is there a slow way to
> check that a repository is fine? I mean, something along the lines of
> checking whether every commit can be checked out without problems.

Other than running `git fsck --full --strict`: not that I'm aware of.
And `--full` isn't even needed because it's the default.

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux