Re: [PATCH 00/15] Refactor chunk-format into an API

Derrick Stolee <stolee@xxxxxxxxx> · Fri, 4 Dec 2020 08:57:34 -0500

On 12/4/2020 7:48 AM, René Scharfe wrote:
> Am 03.12.20 um 17:16 schrieb Derrick Stolee via GitGitGadget:
...
>>  Documentation/technical/chunk-format.txt      |  54 ++
>>  .../technical/commit-graph-format.txt         |   3 +
>>  Documentation/technical/pack-format.txt       |   3 +
>>  Makefile                                      |   1 +
>>  chunk-format.c                                | 105 ++++
>>  chunk-format.h                                |  69 +++
>>  commit-graph.c                                | 298 ++++++-----
>>  midx.c                                        | 466 ++++++++----------
>>  t/t5318-commit-graph.sh                       |   2 +-
>>  t/t5319-multi-pack-index.sh                   |   6 +-
>>  10 files changed, 623 insertions(+), 384 deletions(-)
> 
> 623-384-54-3-3-1-69-2-6 = 101
> 
> So if we ignore changes to documentation, headers, tests and build
> script this spends ca. 100 more lines of code than the current version.
> That's roughly the size of the new file chunk-format.c -- from this
> bird's-eye-view the new API seems to be pure overhead.

Overhead in terms of lines of code, but many of those are function
prototypes and single lines containing only "{" and "}". So yes,
the code files are a bit longer, but the amount of executed code is
not meaningfully different.

Extra lines of code is an expected cost of refactoring. The remaining
question is, "is it worth the cost?" I believe it is.

> In the new code I see several magic numbers, use of void pointers and
> casting as well as repetition -- is this really going in the right
> direction?  I get the feeling that YAGNI.

void pointers are a cost of abstraction in C that we use all over the
codebase.

You (and Junio) are right to point out my magic numbers. Those should
be replaced with something better when possible.

As far as YAGNI, I doubt that very much. First, we have already seen
extensions to the commit-graph that added several new chunks, and
plugging into this (documented) API should be easier than the previous
ad-hoc mechanism.

I've CC'd Abhishek to get his opinion, since he's recently added chunks
to the commit-graph file. Outside of the fact that this series conflicts
with his series (which I will fix), it would be good to see if he
appreciates this model.

>> I was thinking about file formats recently and realized that the "chunks"
>> that are common to the commit-graph and multi-pack-index could inform future
>> file formats. To make that process easier, let's combine the process of
>> writing and reading chunks into a common API that both of these existing
>> formats use.

And another point on YAGNI: I'm literally prototyping a new file format and
want to use this API to build it instead of repeating myself. Specifically,
I noticed that the commit-graph and multi-pack-index were inconsistent in
how they protected the file format in different ways during writes and reads.
This leads to...

>> There is some extra benefit immediately: the writing and reading code for
>> each gets a bit cleaner. Also, there were different checks in each that made
>> the process more robust. Now, these share a common set of checks.

...my point that combining these checks make both codepaths slightly more
robust. I didn't even include the potential extension of storing the size
of each chunk in "struct commit_graph" and "struct multi_pack_index" for
run-time bound checks during lookups. That seemed like too much new
behavior for a series that intends to only refactor.

Thanks,
-Stolee