Re: [RFC] Implement consistency check for packed refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 16, 2024 at 03:49:01PM +0100, Patrick Steinhardt wrote:
> On Mon, Dec 16, 2024 at 09:58:13PM +0800, shejialuo wrote:
> > Hi all,
> > 
> > At current, I have already implemented the consistency check for files
> > backend. My next step is to implement the consistency check for packed
> > refs. And during these days, I have learned the source code of the
> > "refs/packed-backend.c".
> 
> Great. I'm also starting to work into the direction of consistency
> checks for reftables right now, which requires a couple more changes for
> the reftable library to expose reftable blocks. I'll probably get to it
> early in the next release cycle.
> 

I am also happy to see this. We are gradually making things work. It's
good. And maybe we could make a GSoC project "implement consistency check
for reftable backend" if we decide to attend GSoC next year.

> > The current "git-fsck(1)" implicitly checks the packed refs by calling
> > the method `refs/packed-backend.c::packed_refs_lock` which alls
> > `get_snapshot` and `create_snapshot`.
> > 
> > In the `create_snapshot` function, it will check some consistency of the
> > "packed-refs" file, if anything is wrong, the program will die.
> > "git-fsck(1)" relies on this to check the basic format correctness for
> > "packed-refs" file. It's not suitable to just call "packed_refs_lock"
> > in the code, we should not die the program.
> > 
> > Based on above, I have some ideas to implement the functionality. But
> > before I implement the actual code, I want to share my ideas to the
> > mailing list to get some feedback.
> > 
> > There are two ways we could add consistency check for packed refs.
> > 
> > 1. We simply read the raw file "packed-refs" file, check each line. Of
> > course, we should also need to check whether the content is sorted.
> > 2. We may reuse the data structure "struct snapshot" to do this. And we
> > call "packed_refs_lock" without calling the "creating_snapshot" instead,
> > we should just check. The reason why we do this is that we could reuse
> > some functions defined in the "packed-backend.c" instead of repeating
> > some logics. However, this way would be more complicated and require
> > more effort.
> 
> Hm. I cannot really say much on this. The important part is that you
> have enough information at hand to be able to implement those checks. If
> you have all necessary information in both cases I would recommend to go
> with the one that is simpler.
> 
> > However, one thing I am not sure is that should we lock the raw file
> > when checking? From my perspective, we should lock the file because we
> > are checking the consistency of it. If other processes are processing
> > the "packed-refs", we may encounter inconsistency and things would be
> > strange.
> 
> The consistency checks may run for an extended amount of time in repos
> with huge number of refs, and locking for the whole duration may easily
> cause problems.
> 
> Furthermore, "packed-refs" files are written atomically: the client
> writes a new "packed-refs.lock" file, syncs it to disk and then renames
> it into place. This operation doesn't invalidate any file descriptors
> that refer to the old file and you can continue reading from it, so the
> snapshot would remain consistent in itself. It could of course become
> inconsistent with any loose refs, but that's always the case with the
> "files" backend and something we'll have to accept.
> 
> So I don't see any reason why the consistency checks should lock the
> "packed-refs" file.
> 

Thanks, I either don't want to lock the file because this would bring
complexity but without much benefit.

> Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux