On Wed, Mar 06, 2024 at 09:20:36PM +0800, shejialuo wrote: > Hi All, > > I am interested in "Implement consistency checks for refs" GSoC idea. > However, implementing a feautre is much harder. So I wanna ask you some > questions to better work on. Sure! > As [1] shows, I think the idea is easy to understand. We need to ensure > the consistency of the refs. The current `git-fsck` only checks the > connectivity from ref to the object file. There is a possiblity that ref > itself could be corrupted. And we should avoid it through this project. I know this is splitting hairs, but git-fsck(1) doesn't give us the tools to avoid corruption. It only gives us the tools to detect it after the fact. > I have read some source codes. Based on what I have learned, I know > there are two backends. One is file and another is reftable. I have > no idea about the reftable currently. So at now, I will focus on the > file backend. Yeah, the "reftable" backend is new in the Git v2.45 release cycle, so it's totally expected that most peeople have no idea about it. It's also part of the motivation for this project though. Because as you noted, it is a binary format that is thus not as readily parseable by a human as the old "files, backend. This makes it much more important to provide the tooling to detect whether things look as expected. > I think the principle behind the `git-fsck` is that it will traverse > every object file, read its content and use SHA-1 to hash the content > and compare the value with the stored ref value. So if we want to add > consistency checks for refs. We may need to add a new file to store the > last commit state (not only last commit state, do we need to consider > the stash state). However, from my perspective, it's a bad idea to use a > file to store the refs' states and we cannot use object file to check > whether the ref is corrupted. I agree a 100% -- tracking ref states in a secondary database is not a good idea. > So this is my first question, what mechanism should we use to provide > consistency? And to what extend for the consistency. And I think this > mechanism should be general for both text-based and binary-based refs. The exact extent will need some discussion. What's clear is that it does not need to be perfect from the beginning, and we are sure to discover more checks over time that may make sense. Some ideas from the top of my head: - generic - Ensure that all ref names are conformant. - Ensure that there are no directory/file conflicts for the ref names. - files - Ensure that "packed-refs" is well-formatted. - Ensure that refs in "packed-refs" are ordered lexicographically. - Check for corrupted loose refs in "refs/". - reftable - Ensure that there are no garbage files in "reftable/". - Ensure that "tables.list" is well-formatted. - Ensure that each table is well-formatted. - Ensure that refs in each table are ordered correctly. This list is not exhaustive, there may of course be other checks that may make sense. Any additional ideas by you or other interested students are be welcome. For what it's worth, not all of the checks need to be implemented as part of GSoC. At a minimum, it should result in the infra to allow for backend-specific checks and a couple of checks for at least one of the backends. > And I have a more general qeustion, I think I need understand `fsck.c` > and of couse the reftable format. However, I am confused whether I need > to understand the ref internal. And could you please provide me more > infomration to make this idea more clear. You will certainly need to learn about ref internals a bit. There are some common rules and restrictions that are important in order to figure out what we want to check in the first place. Understanding the "reftable" format would be great, but you may also get away with only implementing generic or "files"-backend specific consistency checks. This depends on the scope you are aiming for. Patrick > Thanks, > Jialuo > > [1] https://lore.kernel.org/git/ZakIPEytlxHGCB9Y@tanuki/
Attachment:
signature.asc
Description: PGP signature