Fixes issues SZEDER raised with v1, except displaying an accurate ETA in write_graph_*(). As noted in 2/6 I don't think it's worth it, I just adjusted the message instead. Ævar Arnfjörð Bjarmason (6): commit-graph write: rephrase confusing progress output commit-graph write: add more progress output commit-graph write: show progress for object search commit-graph write: add more describing progress output commit-graph write: remove empty line for readability commit-graph write: add even more progress output commit-graph.c | 92 +++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 73 insertions(+), 19 deletions(-) Range-diff: 1: 751d3a7561 ! 1: 093c63e99f commit-graph write: add more progress output a => b | 0 1 file changed, 0 insertions(+), 0 deletions(-) @@ -13,22 +13,30 @@ point at which we're not producing progress output: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write - Finding commits for commit graph: 6418991, done. - Computing commit graph generation numbers: 100% (797205/797205), done. - Writing out commit graph chunks: 2399861, done. + Finding commits for commit graph: 6365492, done. + Computing commit graph generation numbers: 100% (797222/797222), done. + Writing out commit graph: 2399912, done. - This "graph chunks" number is not meant to be meaningful to the user, + This "writing out" number is not meant to be meaningful to the user, but just to show that we're doing work and the command isn't hanging. + In the current implementation it's approximately 4x the number of + commits. As noted in on-list discussion[1] we could add the loops up + and show percentage progress here, but I don't think it's worth it. It + would make the implementation more complex and harder to maintain for + very little gain. + On a much larger in-house repository I have we'll show (note how we also say "Annotating[...]"): $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write - Finding commits for commit graph: 48271163, done. - Annotating commit graph: 21424536, done. - Computing commit graph generation numbers: 100% (7141512/7141512), done. - Writing out commit graph chunks: 21424913, done. + Finding commits for commit graph: 50026015, done. + Annotating commit graph: 21567407, done. + Computing commit graph generation numbers: 100% (7144680/7144680), done. + Writing out commit graph: 21434417, done. + + 1. https://public-inbox.org/git/20181120165800.GB30222@xxxxxxxxxx/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> @@ -50,8 +58,7 @@ */ for (i = 0; i < 256; i++) { while (count < nr_commits) { -+ if (progress) -+ display_progress(progress, ++*progress_cnt); ++ display_progress(progress, ++*progress_cnt); if ((*list)->object.oid.hash[0] != i) break; count++; @@ -68,8 +75,7 @@ int count; - for (count = 0; count < nr_commits; count++, list++) + for (count = 0; count < nr_commits; count++, list++) { -+ if (progress) -+ display_progress(progress, ++*progress_cnt); ++ display_progress(progress, ++*progress_cnt); hashwrite(f, (*list)->object.oid.hash, (int)hash_len); + } } @@ -87,15 +93,13 @@ struct commit **list = commits; struct commit **last = commits + nr_commits; @@ + struct commit_list *parent; int edge_value; uint32_t packedDate[2]; ++ display_progress(progress, ++*progress_cnt); -+ if (progress) -+ display_progress(progress, ++*progress_cnt); -+ parse_commit(*list); hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len); - @@ static void write_graph_chunk_large_edges(struct hashfile *f, @@ -108,20 +112,18 @@ struct commit **list = commits; struct commit **last = commits + nr_commits; @@ + commits, nr_commits, commit_to_sha1); ++ display_progress(progress, ++*progress_cnt); -+ if (progress) -+ display_progress(progress, ++*progress_cnt); -+ if (edge_value < 0) edge_value = GRAPH_PARENT_MISSING; - else if (!parent->next) @@ int num_extra_edges; struct commit_list *parent; struct progress *progress = NULL; -+ uint64_t progress_cnt; ++ uint64_t progress_cnt = 0; if (!commit_graph_compatible(the_repository)) return; @@ -135,8 +137,8 @@ - write_graph_chunk_large_edges(f, commits.list, commits.nr); + if (report_progress) + progress = start_delayed_progress( -+ _("Writing out commit graph chunks"), -+ progress_cnt = 0); ++ _("Writing out commit graph"), ++ 0); + write_graph_chunk_fanout(f, commits.list, commits.nr, progress, + &progress_cnt); + write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr, 2: d750f0dd16 ! 2: 6c71de9460 commit-graph write: show progress for object search a => b | 0 1 file changed, 0 insertions(+), 0 deletions(-) @@ -8,18 +8,17 @@ Before we'd emit on e.g. linux.git with "commit-graph write": - Finding commits for commit graph: 6418991, done. + Finding commits for commit graph: 6365492, done. [...] And now: - Finding commits for commit graph: 100% (6418991/6418991), done. + Finding commits for commit graph: 100% (6365492/6365492), done. [...] - Since the commit graph only includes those commits that are - packed (via for_each_packed_object(...)) the - approximate_object_count() returns the actual number of objects we're - going to process. + Since the commit graph only includes those commits that are packed + (via for_each_packed_object(...)) the approximate_object_count() + returns the actual number of objects we're going to process. Still, it is possible due to a race with "gc" or another process maintaining packs that the number of objects we're going to process is @@ -40,7 +39,7 @@ @@ struct commit_list *parent; struct progress *progress = NULL; - uint64_t progress_cnt; + uint64_t progress_cnt = 0; + unsigned long approx_nr_objects; if (!commit_graph_compatible(the_repository)) 3: a175ab49ff ! 3: c665dbdacb commit-graph write: add more describing progress output a => b | 0 1 file changed, 0 insertions(+), 0 deletions(-) @@ -10,18 +10,26 @@ we support: $ git commit-graph write - Finding commits for commit graph among packed objects: 100% (6418991/6418991), done. + Finding commits for commit graph among packed objects: 100% (6365492/6365492), done. [...] + + # Actually we don't emit this since this takes almost no time at + # all. But if we did (s/_delayed//) we'd show: $ git for-each-ref --format='%(objectname)' | git commit-graph write --stdin-commits - Finding commits for commit graph from 584 ref tips: 100% (584/584), done. + Finding commits for commit graph from 584 refs: 100% (584/584), done. [...] + $ (cd .git/objects/pack/ && ls *idx) | git commit-graph write --stdin-pack - Finding commits for commit graph in 4 packs: 6418991, done. + Finding commits for commit graph in 3 packs: 6365492, done. [...] - The middle on of those is going to be the output users will most - commonly see, since it'll be emitted when they get the commit graph - via gc.writeCommitGraph=true. + The middle on of those is going to be the output users might see in + practice, since it'll be emitted when they get the commit graph via + gc.writeCommitGraph=true. But as noted above you need a really large + number of refs for this message to show. It'll show up on a test + repository I have with ~165k refs: + + Finding commits for commit graph from 165203 refs: 100% (165203/165203), done. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> @@ -30,7 +38,7 @@ +++ b/commit-graph.c @@ struct progress *progress = NULL; - uint64_t progress_cnt; + uint64_t progress_cnt = 0; unsigned long approx_nr_objects; + struct strbuf progress_title = STRBUF_INIT; @@ -66,8 +74,8 @@ - commit_hex->nr); + if (report_progress) { + strbuf_addf(&progress_title, -+ Q_("Finding commits for commit graph from %d ref tip", -+ "Finding commits for commit graph from %d ref tips", ++ Q_("Finding commits for commit graph from %d ref", ++ "Finding commits for commit graph from %d refs", + commit_hex->nr), + commit_hex->nr); + progress = start_delayed_progress(progress_title.buf, 4: 4e11c8b2fd = 4: f70fc5045d commit-graph write: remove empty line for readability a => b | 0 1 file changed, 0 insertions(+), 0 deletions(-) 5: 6fbba22fac ! 5: 2e943fa925 commit-graph write: add even more progress output a => b | 0 1 file changed, 0 insertions(+), 0 deletions(-) @@ -4,24 +4,26 @@ Add more progress output to sections of code that can collectively take 5-10 seconds on a large enough repository. On a test repository - with 7141512 commits (see earlier patches for details) we'll now emit: + with I have with ~7 million commits and ~50 million objects we'll now + emit: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write - Finding commits for commit graph among packed objects: 100% (50009986/50009986), done. - Annotating commit graph: 21564240, done. - Counting distinct commits in commit graph: 100% (7188080/7188080), done. - Finding extra edges in commit graph: 100% (7188080/7188080), done. - Computing commit graph generation numbers: 100% (7143635/7143635), done. - Writing out commit graph chunks: 21431282, done. + Finding commits for commit graph among packed objects: 100% (50026015/50026015), done. + Annotating commit graph: 21567407, done. + Counting distinct commits in commit graph: 100% (7189147/7189147), done. + Finding extra edges in commit graph: 100% (7189147/7189147), done. + Computing commit graph generation numbers: 100% (7144680/7144680), done. + Writing out commit graph: 21434417, done. - Whereas on a medium-sized repository such as linux.git we'll still + Whereas on a medium-sized repository such as linux.git these new + progress bars won't have time to kick in and as before and we'll still emit output like: $ ~/g/git/git --exec-path=$HOME/g/git commit-graph write - Finding commits for commit graph among packed objects: 100% (6365328/6365328), done. - Annotating commit graph: 2391621, done. - Computing commit graph generation numbers: 100% (797207/797207), done. - Writing out commit graph chunks: 2399867, done. + Finding commits for commit graph among packed objects: 100% (6365492/6365492), done. + Annotating commit graph: 2391666, done. + Computing commit graph generation numbers: 100% (797222/797222), done. + Writing out commit graph: 2399912, done. The "Counting distinct commits in commit graph" phase will spend most of its time paused at "0/*" as we QSORT(...) the list. That's not -- 2.20.0.rc0.387.gc7a69e6b6c