Improvements since v6: * Integrate my "commit-graph write: use pack order when finding commits" patch, and per Junio's suggestion put it at the start so it's easier to split the two apart. * A signed-off-by of mine was missing on that patch. Fixed. * Replace SZEDER's two patches with his re-roll at https://public-inbox.org/git/20190118170549.30403-1-szeder.dev@xxxxxxxxx/ for convenienc, since mine needed rebasing on top of his. * SZEDER rightly pointed out that I was doing some work for nothing in https://public-inbox.org/git/20190118171605.GM840@xxxxxxxxxx/; fixed. SZEDER Gábor (2): commit-graph: rename "large edges" to "extra edges" commit-graph: don't call write_graph_chunk_large_edges() unnecessarily Ævar Arnfjörð Bjarmason (8): commit-graph write: use pack order when finding commits commit-graph write: add "Writing out" progress output commit-graph write: more descriptive "writing out" output commit-graph write: show progress for object search commit-graph write: add more descriptive progress output commit-graph write: remove empty line for readability commit-graph write: add itermediate progress commit-graph write: emit a percentage for all progress .../technical/commit-graph-format.txt | 4 +- builtin/commit-graph.c | 4 +- commit-graph.c | 136 +++++++++++++----- commit-graph.h | 2 +- t/t5318-commit-graph.sh | 14 +- 5 files changed, 110 insertions(+), 50 deletions(-) Range-diff: 1: 07d06c50c0 < -: ---------- commit-graph: rename 'num_extra_edges' variable to 'num_large_edges' -: ---------- > 1: 9ca0b5573b commit-graph write: use pack order when finding commits -: ---------- > 2: 4370ea7591 commit-graph: rename "large edges" to "extra edges" 2: 904dda1e7a ! 3: 3468e30c89 commit-graph: don't call write_graph_chunk_large_edges() unnecessarily @@ -10,19 +10,19 @@ there won't be a 'Large Edge List' chunk written, only the three mandatory chunks. - However, when it comes to writing chunk data, write_commit_graph() - unconditionally invokes write_graph_chunk_large_edges(), even when it - was decided earlier that that chunk won't be written. Strictly - speaking there is no bug here, because write_graph_chunk_large_edges() - won't write anything because it won't find any commits with more than - two parents, but then it unnecessarily and in vain looks through all - commits once again in search for such commits. + However, when it later comes to writing actual chunk data, + write_commit_graph() unconditionally invokes + write_graph_chunk_large_edges(), even when it was decided earlier that + that chunk won't be written. Strictly speaking there is no bug here, + because write_graph_chunk_large_edges() won't write anything if it + doesn't find any commits with more than two parents, but then it + unnecessarily and in vain looks through all commits once again in + search for such commits. Don't call write_graph_chunk_large_edges() when that chunk won't be written to spare an unnecessary iteration over all commits. Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> - Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> diff --git a/commit-graph.c b/commit-graph.c --- a/commit-graph.c @@ -31,9 +31,9 @@ write_graph_chunk_fanout(f, commits.list, commits.nr); write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr); write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr); -- write_graph_chunk_large_edges(f, commits.list, commits.nr); -+ if (num_large_edges) -+ write_graph_chunk_large_edges(f, commits.list, commits.nr); +- write_graph_chunk_extra_edges(f, commits.list, commits.nr); ++ if (num_extra_edges) ++ write_graph_chunk_extra_edges(f, commits.list, commits.nr); close_commit_graph(the_repository); finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC); 3: 1126c7e29d < -: ---------- commit-graph write: rephrase confusing progress output 4: 9c17f56ed3 ! 4: 1860fa606e commit-graph write: add "Writing out" progress output @@ -10,14 +10,14 @@ small repositories, but before this change we'd noticeably hang for 2-3 seconds at the end on medium sized repositories such as linux.git. - Now we'll instead show output like this, and have no human-observable - point at which we're not producing progress output: + Now we'll instead show output like this, and reduce the + human-observable times at which we're not producing progress output: - $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write - Finding commits for commit graph: 6365442, done. - Annotating commit graph: 2391666, done. - Computing commit graph generation numbers: 100% (797222/797222), done. - Writing out commit graph: 100% (3188888/3188888), done. + $ ~/g/git/git --exec-path=$HOME/g/git -C ~/g/2015-04-03-1M-git commit-graph write + Finding commits for commit graph: 13064614, done. + Expanding reachable commits in commit graph: 1000447, done. + Computing commit graph generation numbers: 100% (1000447/1000447), done. + Writing out commit graph: 100% (3001341/3001341), done. This "Writing out" number is 3x or 4x the number of commits, depending on the graph we're processing. A later change will make this explicit @@ -87,7 +87,7 @@ hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len); @@ - static void write_graph_chunk_large_edges(struct hashfile *f, + static void write_graph_chunk_extra_edges(struct hashfile *f, struct commit **commits, - int nr_commits) + int nr_commits, @@ -107,7 +107,7 @@ parent = parent->next) num_parents++; @@ - int num_large_edges; + int num_extra_edges; struct commit_list *parent; struct progress *progress = NULL; + uint64_t progress_cnt = 0; @@ -121,25 +121,16 @@ - write_graph_chunk_fanout(f, commits.list, commits.nr); - write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr); - write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr); -+ if (report_progress) { -+ /* -+ * Each of the write_graph_chunk_*() functions just -+ * below loops over our N commits. This number must be -+ * kept in sync with the number of passes we're doing. -+ */ -+ int graph_passes = 3; -+ if (num_large_edges) -+ graph_passes++; ++ if (report_progress) + progress = start_delayed_progress( + _("Writing out commit graph"), -+ graph_passes * commits.nr); -+ } ++ num_chunks * commits.nr); + write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt); + write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr, progress, &progress_cnt); + write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr, progress, &progress_cnt); - if (num_large_edges) -- write_graph_chunk_large_edges(f, commits.list, commits.nr); -+ write_graph_chunk_large_edges(f, commits.list, commits.nr, progress, &progress_cnt); + if (num_extra_edges) +- write_graph_chunk_extra_edges(f, commits.list, commits.nr); ++ write_graph_chunk_extra_edges(f, commits.list, commits.nr, progress, &progress_cnt); + stop_progress(&progress); close_commit_graph(the_repository); 5: 79b0a467d9 ! 5: cc36909add commit-graph write: more descriptive "writing out" output @@ -11,11 +11,11 @@ has to do with writing out the commit graph. Now e.g. on linux.git we emit: - $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write - Finding commits for commit graph: 6365442, done. - Annotating commit graph: 2391666, done. - Computing commit graph generation numbers: 100% (797222/797222), done. - Writing out commit graph in 4 passes: 100% (3188888/3188888), done. + $ ~/g/git/git --exec-path=$HOME/g/git -C ~/g/linux commit-graph write + Finding commits for commit graph: 6529159, done. + Expanding reachable commits in commit graph: 815990, done. + Computing commit graph generation numbers: 100% (815983/815983), done. + Writing out commit graph in 4 passes: 100% (3263932/3263932), done. A note on i18n: Why are we using the Q_() function and passing a number & English text for a singular which'll never be used? Because @@ -36,26 +36,28 @@ if (!commit_graph_compatible(the_repository)) return; @@ - int graph_passes = 3; - if (num_large_edges) - graph_passes++; + hashwrite(f, chunk_write, 12); + } + +- if (report_progress) ++ if (report_progress) { + strbuf_addf(&progress_title, + Q_("Writing out commit graph in %d pass", + "Writing out commit graph in %d passes", -+ graph_passes), -+ graph_passes); ++ num_chunks), ++ num_chunks); progress = start_delayed_progress( - _("Writing out commit graph"), + progress_title.buf, - graph_passes * commits.nr); - } + num_chunks * commits.nr); ++ } write_graph_chunk_fanout(f, commits.list, commits.nr, progress, &progress_cnt); -@@ - write_graph_chunk_large_edges(f, commits.list, commits.nr, progress, &progress_cnt); + write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr, progress, &progress_cnt); + write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr, progress, &progress_cnt); + if (num_extra_edges) + write_graph_chunk_extra_edges(f, commits.list, commits.nr, progress, &progress_cnt); stop_progress(&progress); - + strbuf_release(&progress_title); -+ + close_commit_graph(the_repository); finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC); - commit_lock_file(&lk); 6: b32be83b38 ! 6: 9a4aec16ca commit-graph write: show progress for object search @@ -8,12 +8,12 @@ Before we'd emit on e.g. linux.git with "commit-graph write": - Finding commits for commit graph: 6365442, done. + Finding commits for commit graph: 6529159, done. [...] And now: - Finding commits for commit graph: 100% (6365442/6365442), done. + Finding commits for commit graph: 100% (6529159/6529159), done. [...] Since the commit graph only includes those commits that are packed @@ -59,7 +59,8 @@ - _("Finding commits for commit graph"), 0); + _("Finding commits for commit graph"), + approx_nr_objects); - for_each_packed_object(add_packed_commits, &oids, 0); + for_each_packed_object(add_packed_commits, &oids, + FOR_EACH_OBJECT_PACK_ORDER); + if (oids.progress_done < approx_nr_objects) + display_progress(oids.progress, approx_nr_objects); stop_progress(&oids.progress); 7: 54276723c0 ! 7: 6b523d0f0d commit-graph write: add more descriptive progress output @@ -10,17 +10,17 @@ we support: $ git commit-graph write - Finding commits for commit graph among packed objects: 100% (6365442/6365442), done. + Finding commits for commit graph among packed objects: 100% (6529159/6529159), done. [...] # Actually we don't emit this since this takes almost no time at # all. But if we did (s/_delayed//) we'd show: $ git for-each-ref --format='%(objectname)' | git commit-graph write --stdin-commits - Finding commits for commit graph from 584 refs: 100% (584/584), done. + Finding commits for commit graph from 630 refs: 100% (630/630), done. [...] $ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack - Finding commits for commit graph in 2 packs: 6365442, done. + Finding commits for commit graph in 3 packs: 6529159, done. [...] The middle on of those is going to be the output users might see in @@ -89,5 +89,5 @@ - _("Finding commits for commit graph"), + _("Finding commits for commit graph among packed objects"), approx_nr_objects); - for_each_packed_object(add_packed_commits, &oids, 0); - if (oids.progress_done < approx_nr_objects) + for_each_packed_object(add_packed_commits, &oids, + FOR_EACH_OBJECT_PACK_ORDER); 8: 0e847366e1 = 8: 582f6f477f commit-graph write: remove empty line for readability 9: c388aff73e ! 9: 2c9fa68210 commit-graph write: add itermediate progress @@ -10,26 +10,29 @@ million objects we'll now emit: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write - Finding commits for commit graph among packed objects: 100% (48333911/48333911), done. - Annotating commit graph: 21435984, done. - Counting distinct commits in commit graph: 100% (7145328/7145328), done. - Finding extra edges in commit graph: 100% (7145328/7145328), done. - Computing commit graph generation numbers: 100% (7145328/7145328), done. - Writing out commit graph in 4 passes: 100% (28581312/28581312), done. + Finding commits for commit graph among packed objects: 100% (124763727/124763727), done. + Loading known commits in commit graph: 100% (18989461/18989461), done. + Expanding reachable commits in commit graph: 100% (18989507/18989461), done. + Clearing commit marks in commit graph: 100% (18989507/18989507), done. + Counting distinct commits in commit graph: 100% (18989507/18989507), done. + Finding extra edges in commit graph: 100% (18989507/18989507), done. + Computing commit graph generation numbers: 100% (7250302/7250302), done. + Writing out commit graph in 4 passes: 100% (29001208/29001208), done. Whereas on a medium-sized repository such as linux.git these new progress bars won't have time to kick in and as before and we'll still emit output like: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write - Finding commits for commit graph among packed objects: 100% (6365442/6365442), done. - Annotating commit graph: 2391666, done. - Computing commit graph generation numbers: 100% (797222/797222), done. - Writing out commit graph in 4 passes: 100% (3188888/3188888), done. + Finding commits for commit graph among packed objects: 100% (6529159/6529159), done. + Expanding reachable commits in commit graph: 815990, done. + Computing commit graph generation numbers: 100% (815983/815983), done. + Writing out commit graph in 4 passes: 100% (3263932/3263932), done. The "Counting distinct commits in commit graph" phase will spend most of its time paused at "0/*" as we QSORT(...) the list. That's not - optimal, but at least we don't seem to be stalling anymore. + optimal, but at least we don't seem to be stalling anymore most of the + time. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> @@ -59,7 +62,7 @@ @@ ALLOC_ARRAY(commits.list, commits.alloc); - num_large_edges = 0; + num_extra_edges = 0; + if (report_progress) + progress = start_delayed_progress( + _("Finding extra edges in commit graph"), @@ -73,7 +76,7 @@ @@ commits.nr++; } - num_chunks = num_large_edges ? 4 : 3; + num_chunks = num_extra_edges ? 4 : 3; + stop_progress(&progress); if (commits.nr >= GRAPH_PARENT_MISSING) 10: fd692499e0 < -: ---------- commit-graph write: emit a percentage for all progress -: ---------- > 10: e69bb1be2c commit-graph write: emit a percentage for all progress -- 2.20.1.153.gd81d796ee0