On 11 May 2018 at 23:15, Derrick Stolee <dstolee@xxxxxxxxxxxxx> wrote: > When running 'git commit-graph verify', compare the contents of the > commits that are loaded from the commit-graph file with commits that are > loaded directly from the object database. This includes checking the > root tree object ID, commit date, and parents. > > Parse the commit from the graph during the initial loop through the > object IDs to guarantee we parse from the commit-graph file. > > In addition, verify the generation number calculation is correct for all > commits in the commit-graph file. > > While testing, we discovered that mutating the integer value for a > parent to be outside the accepted range causes a segmentation fault. Add > a new check in insert_parent_or_die() that prevents this fault. Check > for that error during the test, both in the typical parents and in the > list of parents for octopus merges. This paragraph and the corresponding fix and test feel like a separate patch to me. (The commit message of it could be "To test the next patch, we threw invalid data at `git commit-graph verify, and it crashed in pre-existing code, so let's fix that first" -- there is definitely a connection.) Is this important enough to fast-track to master in time for 2.18? My guess would be no. > + > + if (pos >= g->num_commits) > + die("invalide parent position %"PRIu64, pos); s/invalide/invalid/ > @@ -875,6 +879,8 @@ int verify_commit_graph(struct commit_graph *g) > return 1; > > for (i = 0; i < g->num_commits; i++) { > + struct commit *graph_commit; > + > hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); > > if (i && oidcmp(&prev_oid, &cur_oid) >= 0) > @@ -892,6 +898,10 @@ int verify_commit_graph(struct commit_graph *g) > > cur_fanout_pos++; > } > + > + graph_commit = lookup_commit(&cur_oid); > + if (!parse_commit_in_graph_one(g, graph_commit)) > + graph_report("failed to parse %s from commit-graph", oid_to_hex(&cur_oid)); > } Could this end up giving ridiculous amounts of output? It would depend on the input, I guess. > while (cur_fanout_pos < 256) { > @@ -904,5 +914,95 @@ int verify_commit_graph(struct commit_graph *g) > cur_fanout_pos++; > } > > + if (verify_commit_graph_error) > + return 1; Well, here we give up before running into *too* much problem. > + for (i = 0; i < g->num_commits; i++) { > + struct commit *graph_commit, *odb_commit; > + struct commit_list *graph_parents, *odb_parents; > + int num_parents = 0; > + > + hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i); > + > + graph_commit = lookup_commit(&cur_oid); > + odb_commit = (struct commit *)create_object(cur_oid.hash, alloc_commit_node()); > + if (parse_commit_internal(odb_commit, 0, 0)) { > + graph_report("failed to parse %s from object database", oid_to_hex(&cur_oid)); > + continue; > + } > + > + if (oidcmp(&get_commit_tree_in_graph_one(g, graph_commit)->object.oid, > + get_commit_tree_oid(odb_commit))) > + graph_report("root tree object ID for commit %s in commit-graph is %s != %s", > + oid_to_hex(&cur_oid), > + oid_to_hex(get_commit_tree_oid(graph_commit)), > + oid_to_hex(get_commit_tree_oid(odb_commit))); > + > + if (graph_commit->date != odb_commit->date) > + graph_report("commit date for commit %s in commit-graph is %"PRItime" != %"PRItime"", > + oid_to_hex(&cur_oid), > + graph_commit->date, > + odb_commit->date); > + > + > + graph_parents = graph_commit->parents; > + odb_parents = odb_commit->parents; > + > + while (graph_parents) { > + num_parents++; > + > + if (odb_parents == NULL) > + graph_report("commit-graph parent list for commit %s is too long (%d)", > + oid_to_hex(&cur_oid), > + num_parents); > + > + if (oidcmp(&graph_parents->item->object.oid, &odb_parents->item->object.oid)) > + graph_report("commit-graph parent for %s is %s != %s", > + oid_to_hex(&cur_oid), > + oid_to_hex(&graph_parents->item->object.oid), > + oid_to_hex(&odb_parents->item->object.oid)); > + > + graph_parents = graph_parents->next; > + odb_parents = odb_parents->next; > + } > + > + if (odb_parents != NULL) > + graph_report("commit-graph parent list for commit %s terminates early", > + oid_to_hex(&cur_oid)); > + > + if (graph_commit->generation) { > + uint32_t max_generation = 0; > + graph_parents = graph_commit->parents; > + > + while (graph_parents) { > + if (graph_parents->item->generation == GENERATION_NUMBER_ZERO || > + graph_parents->item->generation == GENERATION_NUMBER_INFINITY) > + graph_report("commit-graph has valid generation for %s but not its parent, %s", > + oid_to_hex(&cur_oid), > + oid_to_hex(&graph_parents->item->object.oid)); > + if (graph_parents->item->generation > max_generation) > + max_generation = graph_parents->item->generation; > + graph_parents = graph_parents->next; > + } > + > + if (max_generation == GENERATION_NUMBER_MAX) > + max_generation--; I'm not too familiar with these concepts. Is this a trick in preparation for this: > + > + if (graph_commit->generation != max_generation + 1) Any way that could give a false negative? (I'm not sure it would matter much.) Maybe "if (!MAX && generation != max + 1)". > + graph_report("commit-graph has incorrect generation for %s", > + oid_to_hex(&cur_oid)); > + } else { > + graph_parents = graph_commit->parents; > + > + while (graph_parents) { > + if (graph_parents->item->generation) > + graph_report("commit-graph has generation ZERO for %s but not its parent, %s", > + oid_to_hex(&cur_oid), > + oid_to_hex(&graph_parents->item->object.oid)); > + graph_parents = graph_parents->next; > + } > + } > + } > + > return verify_commit_graph_error; > } At this point, I should admit that I went through the above thinking "right, makes sense, ok, sure". I was not really going "hmm, I wonder ..." This looks like the real meat of "verify", and I'll try to look it over with a fresh pair of eyes tomorrow. > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > + corrupt_data $objdir/info/commit-graph 1134 "\01" && > + corrupt_data $objdir/info/commit-graph 1312 "\01" && > + corrupt_data $objdir/info/commit-graph 1332 "\01" && > + corrupt_data $objdir/info/commit-graph 1712 "\01" && > + corrupt_data $objdir/info/commit-graph 1340 "\01" && > + corrupt_data $objdir/info/commit-graph 1344 "\01" && Could you document these numbers somehow? (Maybe even calculate them from constant inputs, although that might be a form of premature optimization.) When some poor soul has to derive the corresponding numbers for a commit-graph with NewHash, they will thank you. Martin