On 12/4/2020 7:48 AM, René Scharfe wrote: > Am 03.12.20 um 17:16 schrieb Derrick Stolee via GitGitGadget: ... >> Documentation/technical/chunk-format.txt | 54 ++ >> .../technical/commit-graph-format.txt | 3 + >> Documentation/technical/pack-format.txt | 3 + >> Makefile | 1 + >> chunk-format.c | 105 ++++ >> chunk-format.h | 69 +++ >> commit-graph.c | 298 ++++++----- >> midx.c | 466 ++++++++---------- >> t/t5318-commit-graph.sh | 2 +- >> t/t5319-multi-pack-index.sh | 6 +- >> 10 files changed, 623 insertions(+), 384 deletions(-) > > 623-384-54-3-3-1-69-2-6 = 101 > > So if we ignore changes to documentation, headers, tests and build > script this spends ca. 100 more lines of code than the current version. > That's roughly the size of the new file chunk-format.c -- from this > bird's-eye-view the new API seems to be pure overhead. Overhead in terms of lines of code, but many of those are function prototypes and single lines containing only "{" and "}". So yes, the code files are a bit longer, but the amount of executed code is not meaningfully different. Extra lines of code is an expected cost of refactoring. The remaining question is, "is it worth the cost?" I believe it is. > In the new code I see several magic numbers, use of void pointers and > casting as well as repetition -- is this really going in the right > direction? I get the feeling that YAGNI. void pointers are a cost of abstraction in C that we use all over the codebase. You (and Junio) are right to point out my magic numbers. Those should be replaced with something better when possible. As far as YAGNI, I doubt that very much. First, we have already seen extensions to the commit-graph that added several new chunks, and plugging into this (documented) API should be easier than the previous ad-hoc mechanism. I've CC'd Abhishek to get his opinion, since he's recently added chunks to the commit-graph file. Outside of the fact that this series conflicts with his series (which I will fix), it would be good to see if he appreciates this model. >> I was thinking about file formats recently and realized that the "chunks" >> that are common to the commit-graph and multi-pack-index could inform future >> file formats. To make that process easier, let's combine the process of >> writing and reading chunks into a common API that both of these existing >> formats use. And another point on YAGNI: I'm literally prototyping a new file format and want to use this API to build it instead of repeating myself. Specifically, I noticed that the commit-graph and multi-pack-index were inconsistent in how they protected the file format in different ways during writes and reads. This leads to... >> There is some extra benefit immediately: the writing and reading code for >> each gets a bit cleaner. Also, there were different checks in each that made >> the process more robust. Now, these share a common set of checks. ...my point that combining these checks make both codepaths slightly more robust. I didn't even include the potential extension of storing the size of each chunk in "struct commit_graph" and "struct multi_pack_index" for run-time bound checks during lookups. That seemed like too much new behavior for a series that intends to only refactor. Thanks, -Stolee