Re: [PATCH 00/15] Refactor chunk-format into an API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 04, 2020 at 01:48:31PM +0100, René Scharfe wrote:
> Am 03.12.20 um 17:16 schrieb Derrick Stolee via GitGitGadget:
> > I was thinking about file formats recently and realized that the "chunks"
> > that are common to the commit-graph and multi-pack-index could inform future
> > file formats. To make that process easier, let's combine the process of
> > writing and reading chunks into a common API that both of these existing
> > formats use.
> >
> > There is some extra benefit immediately: the writing and reading code for
> > each gets a bit cleaner. Also, there were different checks in each that made
> > the process more robust. Now, these share a common set of checks.
>
> >  Documentation/technical/chunk-format.txt      |  54 ++
> >  .../technical/commit-graph-format.txt         |   3 +
> >  Documentation/technical/pack-format.txt       |   3 +
> >  Makefile                                      |   1 +
> >  chunk-format.c                                | 105 ++++
> >  chunk-format.h                                |  69 +++
> >  commit-graph.c                                | 298 ++++++-----
> >  midx.c                                        | 466 ++++++++----------
> >  t/t5318-commit-graph.sh                       |   2 +-
> >  t/t5319-multi-pack-index.sh                   |   6 +-
> >  10 files changed, 623 insertions(+), 384 deletions(-)
>
> 623-384-54-3-3-1-69-2-6 = 101
>
> So if we ignore changes to documentation, headers, tests and build
> script this spends ca. 100 more lines of code than the current version.
> That's roughly the size of the new file chunk-format.c -- from this
> bird's-eye-view the new API seems to be pure overhead.
>
> In the new code I see several magic numbers, use of void pointers and
> casting as well as repetition -- is this really going in the right
> direction?  I get the feeling that YAGNI.

I think that Stolee is going in the right direction. I suggested earlier
in the thread making a new "chunkfile" type which can handle allocating
new chunks, writing their tables of contents, and so on.

So, I think that we should pursues that direction a little further
before deciding whether or not this is worth continuing. My early
experiments showed that it does add a little more code to the
chunk-format.{c,h} files, but you get negative diffs in midx.c and
commit-graph.c, which is more in line with what I would expect from this
series.

I do think that the "overhead" here is more tolerable than we might
think; I'd rather have a well-documented "chunkfile" implementation
written once and called twice, than two near-identical implementations
of _writing_ the chunks / table of contents at each of the call sites.
So, even if this does end up being a net-lines-added kind of diff, I'd
still say that it's worth it.

With regards to the "YAGNI" comment... I do have thoughts about
extending the reachability bitmap format to use chunks (of course, this
would break compatibility with JGit, and it isn't something that I plan
to do in the short-term, or even necessarily in the future).

In any event, I'm sure that this won't be these two won't be the last
chunk-based formats that we have in Git.

> René

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux