On Tue, Feb 02, 2021 at 03:01:17AM +0000, Derrick Stolee via GitGitGadget wrote: > Here is a bugfix for the recently-landed-in-next generation v2 topic > (ak/corrected-commit-date). > > This was occasionally hitting us when computing new commit-graphs on > existing repositories with the new bits. It was very hard to reproduce, and > it turns out to be due to not parsing commits before accessing generation > number data. Doing so in the right place demonstrates the bug of recomputing > the corrected commit date even for commits in lower layers with computed > values. > > The fix is split into these steps: > > 1. Parse commits more often before accessing their data. (This allows the > bug to be demonstrated in the test suite.) > 2. Check the full commit-graph chain for generation data chunks. > 3. Don't compute corrected commit dates if the lower layers do not support > them. > 4. Parse the commit-graph file more often. > > Thanks, -Stolee > Thanks for the fast bug-fix. It must have been hard to track down why as we weren't able to reproduce this behaviour before. As Taylor said before, this patch is a clear improvement in readability and correctness. Thanks - Abhishek > > Updates in v2 > ============= > > * Fixed some typos or other clarifications in commit messages. > > * The loop assigning read_generation_data is skipped if they already all > agree with value 1. > > * I split compute_generation_numbers into two methods. This essentially > splits the previous patch 4 into patches 4 & 5 here. The new patch 4 just > splits the logic as-is, then the new patch 5 does the re-initialization > of generation values when in the upgrade scenario. > > Derrick Stolee (6): > commit-graph: use repo_parse_commit > commit-graph: always parse before commit_graph_data_at() > commit-graph: validate layers for generation data > commit-graph: compute generations separately > commit-graph: be extra careful about mixed generations > commit-graph: prepare commit graph > > commit-graph.c | 138 +++++++++++++++++++++++++++++----------- > commit.h | 5 +- > t/t5318-commit-graph.sh | 21 ++++++ > 3 files changed, 125 insertions(+), 39 deletions(-) > > > base-commit: 5a3b130cad0d5c770f766e3af6d32b41766374c0 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-850%2Fderrickstolee%2Fgen-v2-upgrade-fix-v2 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-850/derrickstolee/gen-v2-upgrade-fix-v2 > Pull-Request: https://github.com/gitgitgadget/git/pull/850 > > Range-diff vs v1: > > 1: 9c605c99f66 = 1: 9c605c99f66 commit-graph: use repo_parse_commit > 2: 82afa811dff ! 2: 454b183b9ba commit-graph: always parse before commit_graph_data_at() > @@ Commit message > problems when writing a commit-graph with corrected commit dates based > on a commit-graph without them. > > - It has been difficult to identify the issue here becaus it was so hard > + It has been difficult to identify the issue here because it was so hard > to reproduce. It relies on this uninitialized data having a non-zero > value, but also on specifically in a way that overwrites the existing > data. > 3: d554fa30660 ! 3: 3d223fa2156 commit-graph: validate layers for generation data > @@ commit-graph.c: static struct commit_graph *load_commit_graph_chain(struct repos > > - if (!g) > - return; > -- > -- read_generation_data = !!g->chunk_generation_data; > + while (read_generation_data && p) { > + read_generation_data = p->read_generation_data; > + p = p->base_graph; > + } > > +- read_generation_data = !!g->chunk_generation_data; > ++ if (read_generation_data) > ++ return 1; > + > while (g) { > - g->read_generation_data = read_generation_data; > +- g->read_generation_data = read_generation_data; > ++ g->read_generation_data = 0; > g = g->base_graph; > } > + > -+ return read_generation_data; > ++ return 0; > } > > struct commit_graph *read_commit_graph_one(struct repository *r, > -: ----------- > 4: 05248ff222f commit-graph: compute generations separately > 4: b267a9653a7 ! 5: 9bccee8fb63 commit-graph: be extra careful about mixed generations > @@ Commit message > existing commit-graph data has no corrected commit dates. > > While this improves our situation somewhat, we have not completely > - solved the issue for correctly computing generation numbers for mixes > + solved the issue for correctly computing generation numbers for mixed > layers. That follows in the next change. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph > _("Computing commit graph generation numbers"), > ctx->commits.nr); > + > -+ if (ctx->write_generation_data && !ctx->trust_generation_numbers) { > ++ if (!ctx->trust_generation_numbers) { > + for (i = 0; i < ctx->commits.nr; i++) { > + struct commit *c = ctx->commits.list[i]; > + repo_parse_commit(ctx->r, c); > @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph > + > for (i = 0; i < ctx->commits.nr; i++) { > struct commit *c = ctx->commits.list[i]; > - uint32_t level; > -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx) > - corrected_commit_date = commit_graph_data_at(parent->item)->generation; > - > - if (level == GENERATION_NUMBER_ZERO || > -- corrected_commit_date == GENERATION_NUMBER_ZERO) { > -+ (ctx->write_generation_data && > -+ corrected_commit_date == GENERATION_NUMBER_ZERO)) { > - all_parents_computed = 0; > - commit_list_insert(parent->item, &list); > - break; > -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx) > - max_level = GENERATION_NUMBER_V1_MAX - 1; > - *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; > - > -- if (current->date && current->date > max_corrected_commit_date) > -- max_corrected_commit_date = current->date - 1; > -- commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; > -- > -- if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) > -- ctx->num_generation_data_overflows++; > -+ if (ctx->write_generation_data) { > -+ timestamp_t cur_g; > -+ if (current->date && current->date > max_corrected_commit_date) > -+ max_corrected_commit_date = current->date - 1; > -+ cur_g = commit_graph_data_at(current)->generation > -+ = max_corrected_commit_date + 1; > -+ if (cur_g - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) > -+ ctx->num_generation_data_overflows++; > -+ } > - } > - } > - } > + timestamp_t corrected_commit_date; > @@ commit-graph.c: int write_commit_graph(struct object_directory *odb, > } else > ctx->num_commit_graphs_after = 1; > @@ commit-graph.c: int write_commit_graph(struct object_directory *odb, > - validate_mixed_generation_chain(ctx->r->objects->commit_graph); > + ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph); > > - compute_generation_numbers(ctx); > - > + compute_topological_levels(ctx); > + if (ctx->write_generation_data) > 5: dddeec30ebf ! 6: 38086c85b52 commit-graph: prepare commit graph > @@ Commit message > commit-graph: prepare commit graph > > Before checking if the repository has a commit-graph loaded, be sure > - to run prepare_commit_graph(). This is necessary because without this > - instance we would not initialize the topo_levels slab for each of the > - struct commit_graphs in the chain before we start to parse the > - commits. This leads to possibly recomputing the topological levels for > - commits in lower layers even when we are adding a small number of > - commits on top. > + to run prepare_commit_graph(). This is necessary because otherwise > + the topo_levels slab is not initialized. As we compute topo_levels for > + the new commits, we iterate further into the lower layers since the > + first visit to each commit looks as though the topo_level is not > + populated. > > By properly initializing the topo_slab, we fix the previously broken > case of a split commit graph where a base layer has the > > -- > gitgitgadget