On Fri, Dec 04, 2020 at 01:48:31PM +0100, René Scharfe wrote: > Am 03.12.20 um 17:16 schrieb Derrick Stolee via GitGitGadget: > > I was thinking about file formats recently and realized that the "chunks" > > that are common to the commit-graph and multi-pack-index could inform future > > file formats. To make that process easier, let's combine the process of > > writing and reading chunks into a common API that both of these existing > > formats use. > > > > There is some extra benefit immediately: the writing and reading code for > > each gets a bit cleaner. Also, there were different checks in each that made > > the process more robust. Now, these share a common set of checks. > > > Documentation/technical/chunk-format.txt | 54 ++ > > .../technical/commit-graph-format.txt | 3 + > > Documentation/technical/pack-format.txt | 3 + > > Makefile | 1 + > > chunk-format.c | 105 ++++ > > chunk-format.h | 69 +++ > > commit-graph.c | 298 ++++++----- > > midx.c | 466 ++++++++---------- > > t/t5318-commit-graph.sh | 2 +- > > t/t5319-multi-pack-index.sh | 6 +- > > 10 files changed, 623 insertions(+), 384 deletions(-) > > 623-384-54-3-3-1-69-2-6 = 101 > > So if we ignore changes to documentation, headers, tests and build > script this spends ca. 100 more lines of code than the current version. > That's roughly the size of the new file chunk-format.c -- from this > bird's-eye-view the new API seems to be pure overhead. > > In the new code I see several magic numbers, use of void pointers and > casting as well as repetition -- is this really going in the right > direction? I get the feeling that YAGNI. I think that Stolee is going in the right direction. I suggested earlier in the thread making a new "chunkfile" type which can handle allocating new chunks, writing their tables of contents, and so on. So, I think that we should pursues that direction a little further before deciding whether or not this is worth continuing. My early experiments showed that it does add a little more code to the chunk-format.{c,h} files, but you get negative diffs in midx.c and commit-graph.c, which is more in line with what I would expect from this series. I do think that the "overhead" here is more tolerable than we might think; I'd rather have a well-documented "chunkfile" implementation written once and called twice, than two near-identical implementations of _writing_ the chunks / table of contents at each of the call sites. So, even if this does end up being a net-lines-added kind of diff, I'd still say that it's worth it. With regards to the "YAGNI" comment... I do have thoughts about extending the reachability bitmap format to use chunks (of course, this would break compatibility with JGit, and it isn't something that I plan to do in the short-term, or even necessarily in the future). In any event, I'm sure that this won't be these two won't be the last chunk-based formats that we have in Git. > René Thanks, Taylor