On 9/12/2019 10:44 AM, Jeff King wrote: > When the client has asked for certain shallow options like > "deepen-since", we do a custom rev-list walk that pretends to be > shallow. Before doing so, we have to disable the commit-graph, since it > is not compatible with the shallow view of the repository. That's > handled by 829a321569 (commit-graph: close_commit_graph before shallow > walk, 2018-08-20). That commit literally closes and frees our > repo->objects->commit_graph struct. > > That creates an interesting problem for commits that have _already_ been > parsed using the commit graph. Their commit->object.parsed flag is set, > their commit->graph_pos is set, but their commit->maybe_tree may still > be NULL. When somebody later calls repo_get_commit_tree(), we see that > we haven't loaded the tree oid yet and try to get it from the commit > graph. But since it has been freed, we segfault! > > So the root of the issue is a data dependency between the commit's > lazy-load of the tree oid and the fact that the commit graph can go > away mid-process. How can we resolve it? > > There are a couple of general approaches: > > 1. The obvious answer is to avoid loading the tree from the graph when > we see that it's NULL. But then what do we return for the tree oid? > If we return NULL, our caller in do_traverse() will rightly > complain that we have no tree. We'd have to fallback to loading the > actual commit object and re-parsing it. That requires teaching > parse_commit_buffer() to understand re-parsing (i.e., not starting > from a clean slate and not leaking any allocated bits like parent > list pointers). > > 2. When we close the commit graph, walk through the set of in-memory > objects and clear any graph_pos pointers. But this means we also > have to "unparse" any such commits so that we know they still need > to open the commit object to fill in their trees. So it's no less > complicated than (1), and is more expensive (since we clear objects > we might not later need). > > 3. Stop freeing the commit-graph struct. Continue to let it be used > for lazy-loads of tree oids, but let upload-pack specify that it > shouldn't be used for further commit parsing. > > 4. Push the whole shallow rev-list out to its own sub-process, with > the commit-graph disabled from the start, giving it a clean memory > space to work from. > > I've chosen (3) here. Options (1) and (2) would work, but are > non-trivial to implement. Option (4) is more expensive, and I'm not sure > how complicated it is (shelling out for the actual rev-list part is > easy, but we do then parse the resulting commits internally, and I'm not > clear which parts need to be handling shallow-ness). > > The new test in t5500 triggers this segfault, but see the comments there > for how horribly intimate it has to be with how both upload-pack and > commit graphs work. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > commit-graph.c | 12 ++++++++++++ > commit-graph.h | 6 ++++++ > repository.h | 3 +++ > t/t5500-fetch-pack.sh | 38 ++++++++++++++++++++++++++++++++++++++ > upload-pack.c | 2 +- > 5 files changed, 60 insertions(+), 1 deletion(-) > > diff --git a/commit-graph.c b/commit-graph.c > index baeaf0d1bf..bbde647f8b 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -468,6 +468,13 @@ static int prepare_commit_graph(struct repository *r) > { > struct object_directory *odb; > > + /* > + * This must come before the "already attempted?" check below, because > + * we want to disable even an already-loaded graph file. > + */ > + if (r->commit_graph_disabled) > + return 0; > + > if (r->objects->commit_graph_attempted) > return !!r->objects->commit_graph; > r->objects->commit_graph_attempted = 1; > @@ -2101,3 +2108,8 @@ void free_commit_graph(struct commit_graph *g) > free(g->filename); > free(g); > } > + > +void disable_commit_graph(struct repository *r) > +{ > + r->commit_graph_disabled = 1; > +} Thanks for the additional comment and using the repo struct. LGTM. -Stolee