On Tue, Jul 05, 2022 at 05:03:44PM -0400, Will Chandler wrote: > Hi all, > > I think I've reproduced this problem on v2.36.1 and main. Great find. I played around with this a little bit locally and your reproduction works for me on any sufficiently-large repository. Looking around in the code, I'm pretty confident that this is the issue. > $ git -c commitGraph.generationVersion=2 commit-graph write --changed-paths This step is critical: without it we never end up overwriting the `->generation` data, and the conversion works fine. > Because the existing commit graph did not include generation data, > graph.read_generation_data is 0. When compute_bloom_filters executes, it > will call fill_commit_graph_info which checks that value and falls back > to the older generation calculation if false: > > if (g->read_generation_data) { > offset = (timestamp_t)get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); > > if (offset & CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW) { > if (!g->chunk_generation_data_overflow) > die(_("commit-graph requires overflow generation data but has none")); > > offset_pos = offset ^ CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW; > graph_data->generation = item->date + get_be64(g->chunk_generation_data_overflow + 8 * offset_pos); > } else > graph_data->generation = item->date + offset; > } else > graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2; > > This re-writes the commit data to: > > { > graph_pos = 0, > generation = 17631 > } Nicely explained. To me, it seems like the problem is that we're reusing the same slab to: - store data that we're going to write as a part of commit-graph generation, and - store auxiliary data about commits that we have *read* from a commit-graph A complete fix might be to use a separate slab to store data that we read from data that we are about to write, to avoid polluting the latter with the former. In the meantime, a more minimal fix would be to avoid reading and overwriting the generation data where we can avoid it. AFAICT, this is the only spot that would need to be changed. The following patch does the trick for me: --- >8 --- diff --git a/bloom.c b/bloom.c index 5e297038bb..863052fa68 100644 --- a/bloom.c +++ b/bloom.c @@ -30,10 +30,9 @@ static inline unsigned char get_bitmask(uint32_t pos) static int load_bloom_filter_from_graph(struct commit_graph *g, struct bloom_filter *filter, - struct commit *c) + uint32_t graph_pos) { uint32_t lex_pos, start_index, end_index; - uint32_t graph_pos = commit_graph_position(c); while (graph_pos < g->num_commits_in_base) g = g->base_graph; @@ -203,9 +202,11 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, filter = bloom_filter_slab_at(&bloom_filters, c); if (!filter->data) { - load_commit_graph_info(r, c); - if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH) - load_bloom_filter_from_graph(r->objects->commit_graph, filter, c); + uint32_t pos; + if ((repo_find_commit_pos_in_graph(r, c, &pos)) && + (pos != COMMIT_NOT_FROM_GRAPH)) + load_bloom_filter_from_graph(r->objects->commit_graph, + filter, pos); } if (filter->data && filter->len) diff --git a/commit-graph.c b/commit-graph.c index 92d4503336..2d9fad5910 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -889,6 +889,14 @@ static int find_commit_pos_in_graph(struct commit *item, struct commit_graph *g, } } +int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c, + uint32_t *pos) +{ + if (!prepare_commit_graph(r)) + return 0; + return find_commit_pos_in_graph(c, r->objects->commit_graph, pos); +} + struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id) { struct commit *commit; diff --git a/commit-graph.h b/commit-graph.h index 2e3ac35237..52ddfbe349 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -40,6 +40,9 @@ int open_commit_graph(const char *graph_file, int *fd, struct stat *st); */ int parse_commit_in_graph(struct repository *r, struct commit *item); +int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c, + uint32_t *pos); + /* * Look up the given commit ID in the commit-graph. This will only return a * commit if the ID exists both in the graph and in the object database such --- 8< --- Thanks, Taylor