[PATCH v2 0/6] Generation Number v2: Fix a tricky split graph bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here is a bugfix for the recently-landed-in-next generation v2 topic
(ak/corrected-commit-date).

This was occasionally hitting us when computing new commit-graphs on
existing repositories with the new bits. It was very hard to reproduce, and
it turns out to be due to not parsing commits before accessing generation
number data. Doing so in the right place demonstrates the bug of recomputing
the corrected commit date even for commits in lower layers with computed
values.

The fix is split into these steps:

 1. Parse commits more often before accessing their data. (This allows the
    bug to be demonstrated in the test suite.)
 2. Check the full commit-graph chain for generation data chunks.
 3. Don't compute corrected commit dates if the lower layers do not support
    them.
 4. Parse the commit-graph file more often.

Thanks, -Stolee


Updates in v2
=============

 * Fixed some typos or other clarifications in commit messages.

 * The loop assigning read_generation_data is skipped if they already all
   agree with value 1.

 * I split compute_generation_numbers into two methods. This essentially
   splits the previous patch 4 into patches 4 & 5 here. The new patch 4 just
   splits the logic as-is, then the new patch 5 does the re-initialization
   of generation values when in the upgrade scenario.

Derrick Stolee (6):
  commit-graph: use repo_parse_commit
  commit-graph: always parse before commit_graph_data_at()
  commit-graph: validate layers for generation data
  commit-graph: compute generations separately
  commit-graph: be extra careful about mixed generations
  commit-graph: prepare commit graph

 commit-graph.c          | 138 +++++++++++++++++++++++++++++-----------
 commit.h                |   5 +-
 t/t5318-commit-graph.sh |  21 ++++++
 3 files changed, 125 insertions(+), 39 deletions(-)


base-commit: 5a3b130cad0d5c770f766e3af6d32b41766374c0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-850%2Fderrickstolee%2Fgen-v2-upgrade-fix-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-850/derrickstolee/gen-v2-upgrade-fix-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/850

Range-diff vs v1:

 1:  9c605c99f66 = 1:  9c605c99f66 commit-graph: use repo_parse_commit
 2:  82afa811dff ! 2:  454b183b9ba commit-graph: always parse before commit_graph_data_at()
     @@ Commit message
          problems when writing a commit-graph with corrected commit dates based
          on a commit-graph without them.
      
     -    It has been difficult to identify the issue here becaus it was so hard
     +    It has been difficult to identify the issue here because it was so hard
          to reproduce. It relies on this uninitialized data having a non-zero
          value, but also on specifically in a way that overwrites the existing
          data.
 3:  d554fa30660 ! 3:  3d223fa2156 commit-graph: validate layers for generation data
     @@ commit-graph.c: static struct commit_graph *load_commit_graph_chain(struct repos
       
      -	if (!g)
      -		return;
     --
     --	read_generation_data = !!g->chunk_generation_data;
      +	while (read_generation_data && p) {
      +		read_generation_data = p->read_generation_data;
      +		p = p->base_graph;
      +	}
       
     +-	read_generation_data = !!g->chunk_generation_data;
     ++	if (read_generation_data)
     ++		return 1;
     + 
       	while (g) {
     - 		g->read_generation_data = read_generation_data;
     +-		g->read_generation_data = read_generation_data;
     ++		g->read_generation_data = 0;
       		g = g->base_graph;
       	}
      +
     -+	return read_generation_data;
     ++	return 0;
       }
       
       struct commit_graph *read_commit_graph_one(struct repository *r,
 -:  ----------- > 4:  05248ff222f commit-graph: compute generations separately
 4:  b267a9653a7 ! 5:  9bccee8fb63 commit-graph: be extra careful about mixed generations
     @@ Commit message
          existing commit-graph data has no corrected commit dates.
      
          While this improves our situation somewhat, we have not completely
     -    solved the issue for correctly computing generation numbers for mixes
     +    solved the issue for correctly computing generation numbers for mixed
          layers. That follows in the next change.
      
          Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
     @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph
       					_("Computing commit graph generation numbers"),
       					ctx->commits.nr);
      +
     -+	if (ctx->write_generation_data && !ctx->trust_generation_numbers) {
     ++	if (!ctx->trust_generation_numbers) {
      +		for (i = 0; i < ctx->commits.nr; i++) {
      +			struct commit *c = ctx->commits.list[i];
      +			repo_parse_commit(ctx->r, c);
     @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph
      +
       	for (i = 0; i < ctx->commits.nr; i++) {
       		struct commit *c = ctx->commits.list[i];
     - 		uint32_t level;
     -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx)
     - 				corrected_commit_date = commit_graph_data_at(parent->item)->generation;
     - 
     - 				if (level == GENERATION_NUMBER_ZERO ||
     --				    corrected_commit_date == GENERATION_NUMBER_ZERO) {
     -+				    (ctx->write_generation_data &&
     -+				     corrected_commit_date == GENERATION_NUMBER_ZERO)) {
     - 					all_parents_computed = 0;
     - 					commit_list_insert(parent->item, &list);
     - 					break;
     -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx)
     - 					max_level = GENERATION_NUMBER_V1_MAX - 1;
     - 				*topo_level_slab_at(ctx->topo_levels, current) = max_level + 1;
     - 
     --				if (current->date && current->date > max_corrected_commit_date)
     --					max_corrected_commit_date = current->date - 1;
     --				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
     --
     --				if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
     --					ctx->num_generation_data_overflows++;
     -+				if (ctx->write_generation_data) {
     -+					timestamp_t cur_g;
     -+					if (current->date && current->date > max_corrected_commit_date)
     -+						max_corrected_commit_date = current->date - 1;
     -+					cur_g = commit_graph_data_at(current)->generation
     -+					      = max_corrected_commit_date + 1;
     -+					if (cur_g - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
     -+						ctx->num_generation_data_overflows++;
     -+				}
     - 			}
     - 		}
     - 	}
     + 		timestamp_t corrected_commit_date;
      @@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
       	} else
       		ctx->num_commit_graphs_after = 1;
     @@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
      -	validate_mixed_generation_chain(ctx->r->objects->commit_graph);
      +	ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph);
       
     - 	compute_generation_numbers(ctx);
     - 
     + 	compute_topological_levels(ctx);
     + 	if (ctx->write_generation_data)
 5:  dddeec30ebf ! 6:  38086c85b52 commit-graph: prepare commit graph
     @@ Commit message
          commit-graph: prepare commit graph
      
          Before checking if the repository has a commit-graph loaded, be sure
     -    to run prepare_commit_graph(). This is necessary because without this
     -    instance we would not initialize the topo_levels slab for each of the
     -    struct commit_graphs in the chain before we start to parse the
     -    commits. This leads to possibly recomputing the topological levels for
     -    commits in lower layers even when we are adding a small number of
     -    commits on top.
     +    to run prepare_commit_graph(). This is necessary because otherwise
     +    the topo_levels slab is not initialized. As we compute topo_levels for
     +    the new commits, we iterate further into the lower layers since the
     +    first visit to each commit looks as though the topo_level is not
     +    populated.
      
          By properly initializing the topo_slab, we fix the previously broken
          case of a split commit graph where a base layer has the

-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux