On Thu, Jul 29, 2021 at 10:20 AM Jeff King <peff@xxxxxxxx> wrote: > > On Thu, Jul 29, 2021 at 03:58:34AM +0000, Elijah Newren via GitGitGadget wrote: > > > This series is more about strmaps & memory pools than merge logic. CC'ing > > Peff since he reviewed the strmap work[1], and that work included a number > > of decisions that specifically had this series in mind. > > I haven't been following the other optimization threads very closely, > but I'll try to give my general impressions. > > > === Basic Optimization idea === > > > > In this series, I make use of memory pools to get faster allocations and > > deallocations for many data structures that tend to all be deallocated at > > the same time anyway. > > > > === Results === > > > > For the testcases mentioned in commit 557ac0350d ("merge-ort: begin > > performance work; instrument with trace2_region_* calls", 2020-10-28), the > > changes in just this series improves the performance as follows: > > > > Before Series After Series > > no-renames: 204.2 ms ± 3.0 ms 198.3 ms ± 2.9 ms > > mega-renames: 1.076 s ± 0.015 s 661.8 ms ± 5.9 ms > > just-one-mega: 364.1 ms ± 7.0 ms 264.6 ms ± 2.5 ms > > Pretty good results for the mega-renames case. I do wonder how much this > matters in practice. That case is intentionally stressing the system, > though I guess it's not too far-fetched (it's mostly a big directory > rename). However, just "git checkout" across the rename already takes > more than a second. So on the one hand, 400ms isn't nothing. On the > other, I doubt anybody is likely to notice in the grand scheme of > things. The mega-renames case was spurred by looking at a repository at $DAYJOB and trying to generate a similar testcase on a well-known open source repository with approximately the same number of files. The linux kernel was handy for that. Technically, to make it match the right number of renames, I would have needed to rename more toplevel directories, but it was close enough and I liked it being a simple testcase. So, to me, the testcase is more unusual in the number of patches being rebased rather than in the number of renames, but I chose a long sequence of patches (with lots of different types of changes) because that served as a good correctness case and even ended up being good for spurring searches for tweaks to existing optimizations. And I added the just-one-mega to see how a simple cherry-pick would work with the large number of renames. It too has a good relative speedup. You make fair points about the absolute timings for rebase, but the "grand scheme of things" involves usecases outside of traditional rebases. Some of those involve far more merges, making the relative timings more important. They also involve much less overhead -- not only do they get to ignore the "git checkout" present in most rebases, but they also get to exclude the index or worktree updating that are present above. Without that extra overhead, the relative improvement from this patch is even greater. One particular example usecase that is coming soon is the introduction of `git log --remerge-diff`; it makes the timing of individual merges critical since it does so many of them. And it becomes even more important for users who change the default from --diff-merges=off to --diff-merges=remerge-diff, because then any `git log -p` will potentially remerge hundreds or thousands of merge commits. (I've got lots of users who have -p imply --remerge-diff since last November.) > And we're paying a non-trivial cost in code complexity to do it (though > I do think you've done an admirable job of making that cost as low as > possible). Dropping the USE_MEMORY_POOL flag and just always using a > pool would make a lot of that complexity go away. I understand how it > makes leak hunting harder, but I think simpler code would probably be > worth the tradeoff (and in a sense, there _aren't_ leaks in an > always-pool world; we're holding on to all of the memory through the > whole operation). Yeah, I had to keep the USE_MEMORY_POOL flag as I was rebasing my sequence of optimization series to make sure I kept the intermediate steps clean while upstreaming all the work. I ended up just leaving (a simplified form of) it in when I was all done, but it has probably already served its purpose. I'd be fine with dropping it. > I assume your tests are just done using the regular glibc allocator. I Yes. > also wondered how plugging in a better allocator might fare. Here are > timings I did of your mega-renames case with three binaries: one built > with USE_MEMORY_POOL set to 0, one with it set to 1, and one with it set > to 0 but adding "-ltcmalloc" to EXTLIBS via config.mak. > > $ hyperfine \ > -p 'git checkout hwmon-updates && > git reset --hard fd8bdb23b91876ac1e624337bb88dc1dcc21d67e && > git checkout 5.4-renames^0' \ > -L version nopool,pool,tcmalloc \ > './test-tool.{version} fast-rebase --onto HEAD base hwmon-updates' Ooh, I didn't know about -L. > Benchmark #1: ./test-tool.nopool fast-rebase --onto HEAD base hwmon-updates > Time (mean ± σ): 921.1 ms ± 146.0 ms [User: 843.0 ms, System: 77.5 ms] > Range (min … max): 660.9 ms … 1112.2 ms 10 runs > > Benchmark #2: ./test-tool.pool fast-rebase --onto HEAD base hwmon-updates > Time (mean ± σ): 635.4 ms ± 125.5 ms [User: 563.7 ms, System: 71.3 ms] > Range (min … max): 496.8 ms … 856.7 ms 10 runs > > Benchmark #3: ./test-tool.tcmalloc fast-rebase --onto HEAD base hwmon-updates > Time (mean ± σ): 727.3 ms ± 139.9 ms [User: 654.1 ms, System: 72.9 ms] > Range (min … max): 476.3 ms … 900.5 ms 10 runs That's some _really_ wide variance on your runs, making me wonder if you are running other things on your (I presume) laptop that are potentially muddying the numbers. Would the tcmalloc case actually have the fastest run in general, or was it just lucky to hit a "quiet" moment on the laptop? Or perhaps my pre-warming script helps reduce variance more than I thought... > Summary > './test-tool.pool fast-rebase --onto HEAD base hwmon-updates' ran > 1.14 ± 0.32 times faster than './test-tool.tcmalloc fast-rebase --onto HEAD base hwmon-updates' > 1.45 ± 0.37 times faster than './test-tool.nopool fast-rebase --onto HEAD base hwmon-updates' > > The pool allocator does come out ahead when comparing means, but the > improvement is within the noise (and the fastest run was actually with > tcmalloc). > > I was also curious about peak heap usage. According to massif, the pool > version peaks at ~800k extra (out of 82MB), which is negligible. Plus it > has fewer overall allocations, so it seems to actually save 4-5MB in > malloc overhead (though I would imagine that varies between allocators; > I'm just going from massif numbers here). I did similar testing a year ago, before I even looked at memory pools. I was surprised by how big a speedup I saw, and considered asking on the list if we could push to use a different allocator by default. Ultimately, I figured that probably wouldn't fly and distributors might override our choices anyway. It was at that point that I decided to start tweaking mem-pool.[ch] (which ended up getting merged at edab8a8d07 ("Merge branch 'en/mem-pool'", 2020-08-27)), and then integrating that into strmap/strset/strintmap -- all in an effort to guarantee that we realized the speedups that I knew were possible due to my testing with the special allocators. > So...I dunno. It's hard to assess the real-world impact of the speedup, > compared to the complexity cost. Ultimately, this is changing code that > you wrote and will probably maintain. So I'd leave the final decision > for that tradeoff to you. I'm just injecting some thoughts and numbers. :) It's nice to see others duplicate the results, and I appreciate the sanity check on overall usefulness. If it were just optimizing rebase, I probably could have quit long ago when I had 15s rebases of 35 patches. It was part stubbornness on my part wondering why it couldn't be sub-second, and part knowing of other usecases that I wanted to make attractive to others. I want --remerge-diff to be useful *and* practical. I want rebasing/cherry-picking un-checked-out branches to be useful and practical. And I'd like to entice server operators (GitHub, GitLab, etc.) to use the same merge machinery used by end users so that reported merge-ability matches what end users see when they try it themselves. I think you make a good suggestion to drop the USE_MEMORY_POOL switch. I think I'll do it as an additional patch at the end of the series, just so it's easy for me to restore if by change it's ever needed.