On Mon, Dec 16, 2024 at 03:49:01PM +0100, Patrick Steinhardt wrote: > On Mon, Dec 16, 2024 at 09:58:13PM +0800, shejialuo wrote: > > Hi all, > > > > At current, I have already implemented the consistency check for files > > backend. My next step is to implement the consistency check for packed > > refs. And during these days, I have learned the source code of the > > "refs/packed-backend.c". > > Great. I'm also starting to work into the direction of consistency > checks for reftables right now, which requires a couple more changes for > the reftable library to expose reftable blocks. I'll probably get to it > early in the next release cycle. > I am also happy to see this. We are gradually making things work. It's good. And maybe we could make a GSoC project "implement consistency check for reftable backend" if we decide to attend GSoC next year. > > The current "git-fsck(1)" implicitly checks the packed refs by calling > > the method `refs/packed-backend.c::packed_refs_lock` which alls > > `get_snapshot` and `create_snapshot`. > > > > In the `create_snapshot` function, it will check some consistency of the > > "packed-refs" file, if anything is wrong, the program will die. > > "git-fsck(1)" relies on this to check the basic format correctness for > > "packed-refs" file. It's not suitable to just call "packed_refs_lock" > > in the code, we should not die the program. > > > > Based on above, I have some ideas to implement the functionality. But > > before I implement the actual code, I want to share my ideas to the > > mailing list to get some feedback. > > > > There are two ways we could add consistency check for packed refs. > > > > 1. We simply read the raw file "packed-refs" file, check each line. Of > > course, we should also need to check whether the content is sorted. > > 2. We may reuse the data structure "struct snapshot" to do this. And we > > call "packed_refs_lock" without calling the "creating_snapshot" instead, > > we should just check. The reason why we do this is that we could reuse > > some functions defined in the "packed-backend.c" instead of repeating > > some logics. However, this way would be more complicated and require > > more effort. > > Hm. I cannot really say much on this. The important part is that you > have enough information at hand to be able to implement those checks. If > you have all necessary information in both cases I would recommend to go > with the one that is simpler. > > > However, one thing I am not sure is that should we lock the raw file > > when checking? From my perspective, we should lock the file because we > > are checking the consistency of it. If other processes are processing > > the "packed-refs", we may encounter inconsistency and things would be > > strange. > > The consistency checks may run for an extended amount of time in repos > with huge number of refs, and locking for the whole duration may easily > cause problems. > > Furthermore, "packed-refs" files are written atomically: the client > writes a new "packed-refs.lock" file, syncs it to disk and then renames > it into place. This operation doesn't invalidate any file descriptors > that refer to the old file and you can continue reading from it, so the > snapshot would remain consistent in itself. It could of course become > inconsistent with any loose refs, but that's always the case with the > "files" backend and something we'll have to accept. > > So I don't see any reason why the consistency checks should lock the > "packed-refs" file. > Thanks, I either don't want to lock the file because this would bring complexity but without much benefit. > Patrick