Re: [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 01 Mar 2022 18:23:43 +0100

On Mon, Feb 28 2022, Derrick Stolee via GitGitGadget wrote:

> In particular, Git has been ignoring corrected commit dates since shortly
> after they were introduced. This is due to a bug I introduced when trying to
> make split commit-graphs safer with mixed generation number versions. I also
> noticed an issue with the offset overflows that I only noticed after writing
> generation number v3 using a smaller offset size, actually triggering the
> bug in the test suite.

I think sans existing small issues/fixes I noted this looks good.

Just a bit on the overall direction/design. And don't worry, not the
verison v.s. chunk all over again (except notes about how the two
versions eventually interact) :)

I.e. the post-state here looks good, but I wondered about the direction
of:

 * We have a commitGraph.readChangedPaths for *reading* BIDX/BDAT, on by default
 * We have a commitgraph.generationVersion which pre this series is 1, post-2.
 * >= 2 means look at the GDAT/GDOV chunk, and when splitting/rewriting etc.
   carry them forward.

So, I wonder:

A. Do we really need these "yes I'll read this thing in the file" settings at all?

   Isn't it sufficient to have core.commitGraph=false as an escape hatch, do we
   really need to be able to selectively ignore individual parts of the file on-disk?

B. For a "selective ignore" the commitGraph.readChangedPaths=BOOL makes sense, but
   given the follow-up series it seems odd to end up with a commitgraph.generationVersion=3
   which bumps the top-level version of the commit-graph.

I.e. commitgraph.generationVersion=2 *is* optional since the GDAT/GCOV
chunks can be ignored, but commitgraph.generationVersion=3 is *not* since
it'll also bump the format version to v2 (not v3!).

So yeah, like I said I'm being quiet about the top-level version
v.s. chunks here blah blah :)

But for the end-state you want (as I understand it) wouldn't this make
more sense:

1. Just say we have write settings + "core.commitGraph=false" escape
   hatch, no "selective read".

2. Make commitgraph.generationVersion=2 an alias for a more obviously named
   commitGraph.readGenerationData=true (optional). Maybe deprecate
   "commitGraph.readGenerationData" (say "error: just don't write it then")

3. Never have a commitgraph.generationVersion=3 setting, but instead
   add say a core.writeCommitGraphVersion=2.

   We'd thus not be conflating "commitgraph.generationVersion" which *is*
   optional and is on both the read *and* write side, with your WIP
   commitgraph.generationVersion=3 which also changes how writes happen,
   but is not at all optional for reads. You'll either support it or get a
   warning() when loading the graph.

I think this should all be OK, and I don't think it conflicts with your
stated preferences about the top-level version v.s. optional/redundant
chunks.

It makes things simpler since we'd always read data we find, with no
"maybe ignore it if it's there" settings.

And it avoids user confusion of e.g. getting an error about not
understanding a v2 graph after setting a commitgraph.generationVersion=3
setting (since one is 0-indexed...), and having "versions" in the same
config setting being first optionalon read/write, to very much not
optional.