Re: commit-graph overflow generation chicken and egg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 05, 2022 at 05:03:44PM -0400, Will Chandler wrote:
> Hi all,
>
> I think I've reproduced this problem on v2.36.1 and main.

Great find. I played around with this a little bit locally and your
reproduction works for me on any sufficiently-large repository. Looking
around in the code, I'm pretty confident that this is the issue.

>   $ git -c commitGraph.generationVersion=2 commit-graph write --changed-paths

This step is critical: without it we never end up overwriting the
`->generation` data, and the conversion works fine.

> Because the existing commit graph did not include generation data,
> graph.read_generation_data is 0. When compute_bloom_filters executes, it
> will call fill_commit_graph_info which checks that value and falls back
> to the older generation calculation if false:
>
>   if (g->read_generation_data) {
>   	offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index);
>
>   	if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) {
>   		if (!g->chunk_generation_data_overflow)
>   			die(_("commit-graph requires overflow generation data but has none"));
>
>   		offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW;
>   		graph_data->generation = item->date + get_be64(g->chunk_generation_data_overflow + 8 * offset_pos);
>   	} else
>   		graph_data->generation = item->date + offset;
>   } else
>   	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
>
> This re-writes the commit data to:
>
>   {
>     graph_pos = 0,
>     generation = 17631
>   }

Nicely explained. To me, it seems like the problem is that we're reusing
the same slab to:

  - store data that we're going to write as a part of commit-graph
    generation, and
  - store auxiliary data about commits that we have *read* from a
    commit-graph

A complete fix might be to use a separate slab to store data that we
read from data that we are about to write, to avoid polluting the latter
with the former.

In the meantime, a more minimal fix would be to avoid reading and
overwriting the generation data where we can avoid it. AFAICT, this is
the only spot that would need to be changed. The following patch does
the trick for me:

--- >8 ---

diff --git a/bloom.c b/bloom.c
index 5e297038bb..863052fa68 100644
--- a/bloom.c
+++ b/bloom.c
@@ -30,10 +30,9 @@ static inline unsigned char get_bitmask(uint32_t pos)

 static int load_bloom_filter_from_graph(struct commit_graph *g,
 					struct bloom_filter *filter,
-					struct commit *c)
+					uint32_t graph_pos)
 {
 	uint32_t lex_pos, start_index, end_index;
-	uint32_t graph_pos = commit_graph_position(c);

 	while (graph_pos < g->num_commits_in_base)
 		g = g->base_graph;
@@ -203,9 +202,11 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	filter = bloom_filter_slab_at(&bloom_filters, c);

 	if (!filter->data) {
-		load_commit_graph_info(r, c);
-		if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH)
-			load_bloom_filter_from_graph(r->objects->commit_graph, filter, c);
+		uint32_t pos;
+		if ((repo_find_commit_pos_in_graph(r, c, &pos)) &&
+		    (pos != COMMIT_NOT_FROM_GRAPH))
+			load_bloom_filter_from_graph(r->objects->commit_graph,
+						     filter, pos);
 	}

 	if (filter->data && filter->len)
diff --git a/commit-graph.c b/commit-graph.c
index 92d4503336..2d9fad5910 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -889,6 +889,14 @@ static int find_commit_pos_in_graph(struct commit *item, struct commit_graph *g,
 	}
 }

+int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
+				  uint32_t *pos)
+{
+	if (!prepare_commit_graph(r))
+		return 0;
+	return find_commit_pos_in_graph(c, r->objects->commit_graph, pos);
+}
+
 struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
 {
 	struct commit *commit;
diff --git a/commit-graph.h b/commit-graph.h
index 2e3ac35237..52ddfbe349 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -40,6 +40,9 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
  */
 int parse_commit_in_graph(struct repository *r, struct commit *item);

+int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,
+				  uint32_t *pos);
+
 /*
  * Look up the given commit ID in the commit-graph. This will only return a
  * commit if the ID exists both in the graph and in the object database such

--- 8< ---

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux