On 2/18/2020 12:59 PM, Jakub Narebski wrote: > "Jeff King via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > >> From: Jeff King <peff@xxxxxxxx> >> >> Looking at the diff of commit objects in pack order is much faster than >> in sha1 order, as it gives locality to the access of tree deltas > > Nitpick: should we still say sha1 order? Git is still using SHA-1 as an > *oid*, but hopefully soon it will be transitioning to NewHash = SHA-256. > (No need to change anything.) > >> (whereas sha1 order is effectively random). Unfortunately the >> commit-graph code sorts the commits (several times, sometimes as an oid >> and sometimes a pointer-to-commit), and we ultimately traverse in sha1 >> order. > > Actually, commit-graph code needs write_commit_graph_context.commits.list > to be in lexicographical order to be able to turn position in graph into > reference to a commit. The information about the parents of the commit > are stored using positional references within the graph file. > You are right. Fixing the commit message in v3. >> >> Instead, let's remember the position at which we see each commit, and >> traverse in that order when looking at bloom filters. This drops my time >> for "git commit-graph write --changed-paths" in linux.git from ~4 >> minutes to ~1.5 minutes. > > Nitpick: with reordering of patches (which I think is otherwise a good > thing) this patch actually comes before the one adding "--changed-paths" > option to "git commit-graph write". So it 'This would drop my time' > rather than 'This drops my time...' ;-) > :) I will fix that up. >> >> Probably the "--reachable" code path would want something similar. > > Has anyone tried doing this? > I will and I will include the perf numbers in the appropriately in v3. >> + >> char *get_commit_graph_filename(const char *obj_dir) >> { >> char *filename = xstrfmt("%s/info/commit-graph", obj_dir); >> @@ -1027,6 +1051,8 @@ static int add_packed_commits(const struct object_id *oid, >> oidcpy(&(ctx->oids.list[ctx->oids.nr]), oid); >> ctx->oids.nr++; >> >> + set_commit_pos(ctx->r, oid); >> + >> return 0; >> } >> >> @@ -1147,6 +1173,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx) >> { >> int i; >> struct progress *progress = NULL; >> + struct commit **sorted_by_pos; > > In the next patch in series we would sort commits by generation number > and creation data; shouldn't this variable name be more generic to > reflect this, for example just `sorted_commits` or `commits_sorted`? > Good call. I will clean this up in both commits. Thanks for the review! Cheers! Garima Singh