On Thu, Oct 11, 2018 at 07:52:21PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Oct 10 2018, Ævar Arnfjörð Bjarmason wrote: > > > On Wed, Oct 10 2018, SZEDER Gábor wrote: > > > >> On Wed, Oct 10, 2018 at 11:56:45PM +0200, Ævar Arnfjörð Bjarmason wrote: > >>> On Wed, Oct 10 2018, SZEDER Gábor wrote: > >> > >>> >> for (i = 0; i < oids->nr; i++) { > >>> >> + display_progress(progress, ++j); > >>> > >>> [...] > >>> > >>> > This display_progress() call, however, doesn't seem to be necessary. > >>> > First, it counts all commits for a second time, resulting in the ~2x > >>> > difference compared to the actual number of commits, and then causing > >>> > my confusion. Second, all what this loop is doing is setting a flag > >>> > in commits that were already looked up and parsed in the above loops. > >>> > IOW this loop is very fast, and the progress indicator jumps from > >>> > ~780k right to 1.5M, even on my tiny laptop, so it doesn't need a > >>> > progress indicator at all. > Hrm, actually reading this again your initial post says we end up with a > 2x difference v.s. the number of commits, but it's actually 3x. Well, it depends on how you create the commit-graph and on the repo as well, I guess. I run 'git commit-graph write --reachable' in a repo created by 'git clone --single-branch ...', and in that case the difference is only ~2x (the first loop in close_reachable() has as many iterations as the number of refs). If the repo were to contain twice as many refs as commits, then the difference could be as high as 4x. However, I think I might have noticed an other progress counting issue as well, will get back to it later but first I have to get my numbers straight. > The loop > that has a rather trivial runtime comparatively is the 3x, but the 2x > loop takes a notable amount of time. So e.g. on git.git: > > $ git rev-list --all | wc -l; ~/g/git/git commit-graph write > 166678 > Annotating commits in commit graph: 518463, done. > Computing commit graph generation numbers: 100% (172685/172685), done.