On Sat, Sep 07, 2019 at 01:04:40AM -0400, Jeff King wrote: > The commit-graph tool may read a lot of commits, but it only cares about > parsing their metadata (parents, trees, etc) and doesn't ever show the > messages to the user. And so it should not need save_commit_buffer, > which is meant for holding onto the object data of parsed commits so > that we can show them later. In fact, it's quite harmful to do so. > According to massif, the max heap of "git commit-graph write > --reachable" in linux.git before/after this patch (removing the commit > graph file in between) goes from ~1.1GB to ~270MB. > > Which isn't surprising, since the difference is about the sum of the > uncompressed sizes of all commits in the repository, and this was > equivalent to leaking them. > > This obviously helps if you're under memory pressure, but even without > it, things go faster. My before/after times for that command (without > massif) went from 12.521s to 11.874s, a speedup of ~5%. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > We didn't actually notice this on linux.git, but rather on a repository > with 130 million commits (don't ask). With this patch, I was able to > generate the commit-graph file with a peak heap of ~25GB, which is ~200 > bytes per commit. > > I'll bet we could do better with some effort, but obviously this case > was just pathological. For most cases this should be cheaper than a > normal repack (which probably spends that much memory on each object, > not just commits). > > builtin/commit-graph.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c > index 57863619b7..052696f1af 100644 > --- a/builtin/commit-graph.c > +++ b/builtin/commit-graph.c > @@ -251,6 +251,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix) > builtin_commit_graph_usage, > PARSE_OPT_STOP_AT_NON_OPTION); > > + save_commit_buffer = 0; > + This looks exactly right to me. We had discussed a little bit off-list about where you might place this line, but I think that the spot you picked is perfect: as late as possible. Thankfully, the option parsing code here doesn't load any commits (though even if it did, I don't think that turning on/off 'save_commit_buffer' would really make much of a difference). So, the patch here looks obviously correct, and I don't think it needs a test or anything like that... besides: what is there to test? :). > if (argc > 0) { > if (!strcmp(argv[0], "read")) > return graph_read(argc, argv); > -- > 2.23.0.474.gb1abd76f7a Thanks, Taylor