Re: [RFC] Implement consistency check for packed refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 16, 2024 at 09:58:13PM +0800, shejialuo wrote:
> Hi all,
> 
> At current, I have already implemented the consistency check for files
> backend. My next step is to implement the consistency check for packed
> refs. And during these days, I have learned the source code of the
> "refs/packed-backend.c".

Great. I'm also starting to work into the direction of consistency
checks for reftables right now, which requires a couple more changes for
the reftable library to expose reftable blocks. I'll probably get to it
early in the next release cycle.

> The current "git-fsck(1)" implicitly checks the packed refs by calling
> the method `refs/packed-backend.c::packed_refs_lock` which alls
> `get_snapshot` and `create_snapshot`.
> 
> In the `create_snapshot` function, it will check some consistency of the
> "packed-refs" file, if anything is wrong, the program will die.
> "git-fsck(1)" relies on this to check the basic format correctness for
> "packed-refs" file. It's not suitable to just call "packed_refs_lock"
> in the code, we should not die the program.
> 
> Based on above, I have some ideas to implement the functionality. But
> before I implement the actual code, I want to share my ideas to the
> mailing list to get some feedback.
> 
> There are two ways we could add consistency check for packed refs.
> 
> 1. We simply read the raw file "packed-refs" file, check each line. Of
> course, we should also need to check whether the content is sorted.
> 2. We may reuse the data structure "struct snapshot" to do this. And we
> call "packed_refs_lock" without calling the "creating_snapshot" instead,
> we should just check. The reason why we do this is that we could reuse
> some functions defined in the "packed-backend.c" instead of repeating
> some logics. However, this way would be more complicated and require
> more effort.

Hm. I cannot really say much on this. The important part is that you
have enough information at hand to be able to implement those checks. If
you have all necessary information in both cases I would recommend to go
with the one that is simpler.

> However, one thing I am not sure is that should we lock the raw file
> when checking? From my perspective, we should lock the file because we
> are checking the consistency of it. If other processes are processing
> the "packed-refs", we may encounter inconsistency and things would be
> strange.

The consistency checks may run for an extended amount of time in repos
with huge number of refs, and locking for the whole duration may easily
cause problems.

Furthermore, "packed-refs" files are written atomically: the client
writes a new "packed-refs.lock" file, syncs it to disk and then renames
it into place. This operation doesn't invalidate any file descriptors
that refer to the old file and you can continue reading from it, so the
snapshot would remain consistent in itself. It could of course become
inconsistent with any loose refs, but that's always the case with the
"files" backend and something we'll have to accept.

So I don't see any reason why the consistency checks should lock the
"packed-refs" file.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux