Hi Uwe, On Fri, May 28, 2021 at 2:40 PM Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx> wrote: > > Hello Elijah, > > On Thu, May 27, 2021 at 04:08:32PM -0700, Elijah Newren wrote: > > On Thu, May 27, 2021 at 2:59 PM Uwe Kleine-König > > <u.kleine-koenig@xxxxxxxxxxxxxx> wrote: > > > On Wed, May 26, 2021 at 07:38:08AM -0700, Elijah Newren wrote: > > > > On Wed, May 26, 2021 at 3:13 AM Uwe Kleine-König > > > > <u.kleine-koenig@xxxxxxxxxxxxxx> wrote: ... > > Note: In your original report you had rename detection and it clearly > > took a significant amount of time... > > FTR: My impression is that the repo I used for the first report is slow > in general. Also git log sometimes takes a considerable time to start > emitting output. > ... > > I learned a few things since my last mail, here comes an updated test > again on the machine and repo used for the initial report: > > ukl@xxxxxxxx:~/gsrc/linux$ wgit version > git version 2.32.0.rc1 > > ukl@xxxxxxxx:~/gsrc/linux$ cat rebasecheck > #!/bin/bash > > set -e > > # do it once to heat the caches and ensure all objects are available already to have the next cycles identical. > wgit checkout 0091ecb84cfdef0f4cb65810219f5ac9bb4341e5 > wgit rebase v5.10 > > wgit checkout 0091ecb84cfdef0f4cb65810219f5ac9bb4341e5 > echo "rebase v5.10" > time wgit rebase v5.10 > > wgit checkout 0091ecb84cfdef0f4cb65810219f5ac9bb4341e5 > echo "rebase --onto v5.10 v5.4" > time wgit rebase --onto v5.10 v5.4 > > I do the rebase now once before the timing for the reasons described in > the comment. The second identical command is quite a bit quicker. Also > now that the commands are scripted they are done in a smaller time frame > (which matters as the machine is used heavily among my colleagues and > me). I run the script a few times in a row, after all colleagues are in > their week-end: > > ukl@xxxxxxxx:~/gsrc/linux$ bash rebasecheck > ... > rebase v5.10 > ... > real 1m13.579s > user 1m2.919s > sys 0m6.220s > ... > rebase --onto v5.10 v5.4 > ... > real 1m2.852s > user 0m53.780s > sys 0m6.225s > > ukl@xxxxxxxx:~/gsrc/linux$ bash rebasecheck > ... > rebase v5.10 > ... > real 1m10.816s > user 1m3.344s > sys 0m6.991s > ... > rebase --onto v5.10 v5.4 > ... > real 0m59.695s > user 0m53.510s > sys 0m5.579s > > ukl@xxxxxxxx:~/gsrc/linux$ bash rebasecheck > ... > rebase v5.10 > ... > real 1m9.688s > user 1m3.346s > sys 0m6.105s > ... > rebase --onto v5.10 v5.4 > ... > real 0m59.981s > user 0m52.931s > sys 0m6.282s > > So it's not a factor 2 any more, but still reproducibly quicker when > --onto is used. Yep, so that looks like the results I was getting. Adding --reapply-cherry-picks should remove most of that time difference as I stated in my previous email. > > However, the 7-8 second difference (and the likely large differences > > between 5.4 and 5.10) do suggest that Junio's hunch that fork-point > > behavior being at play could be an issue in these two commands. I don't think --no-fork-point will matter here since you are detaching HEAD before running rebase. fork-point is all about looking up the reflog of the current branch to find better matches. --reapply-cherry-picks should help you out and erase most of this 7-8 second difference. > > > > running again with either command would give you something closer to > > > > the lower time both times. Is that the case? (Also, what's the > > > > output of "git count-objects -v"?) > > > > > > After the above commands I have: > > > > > > count: 3203 > > > size: 17664 > > > in-pack: 4763753 > > > packs: 11 > > > size-pack: 1273957 > > > prune-packable: 19 > > > garbage: 0 > > > size-garbage: 0 > > > > So, not freshly packed, but not in need of an automatic gc either. > > > > > alternate: /home/uwe/var/gitstore/linux.git/objects > > > > You've got an alternate? How well packed is it? (What does "git > > count-objects -v" in that other repo show?) > > ... > > In the alternate I have: > > ukl@xxxxxxxx:/ptx/src/git/linux.git/objects$ wgit count-objects -v > warning: garbage found: /ptx/work/user/git/linux.git/objects/pack/tmp_pack_X9gHnq > count: 5035 This is really close to the threshold of needing repacking, but still okay. > size: 40720 > in-pack: 87083076 > packs: 1108 1108 packs!?!? This will make all kinds of operations slow. This explains your comment about operations with your original repo being slow in general, and why you feel you need to do a warmup run first to get a reasonable timing. 50 is the limit where repacking is deemed necessary; you're 2116% beyond that point. I've only seen repos with pack counts near this level a couple times and they are excruciatingly painful to deal with. However, be careful not to use "git gc" or "git prune" in this repo, since it's used as an alternate (doing so could corrupt the repos that depend on this one). Just use "git repack" with the appropriate flags instead. > size-pack: 51109693 51G. Wow. A fresh clone of linux is waaay smaller than that. 3 G, I think? I would have thought lots of your packs were small, but this suggests you probably have lots of duplicate objects in these packs. > prune-packable: 3050 > garbage: 1 > size-garbage: 1112612 And 1 G of garbage that could just be deleted. > I rerun the script with -sort added: > > ukl@xxxxxxxx:~/gsrc/linux$ bash rebasecheck > ... > rebase v5.10 > ... > real 0m25.047s > user 0m17.652s > sys 0m5.802s > ... > rebase --onto v5.10 v5.4 > ... > real 0m12.471s > user 0m7.854s > sys 0m4.413s > > ukl@xxxxxxxx:~/gsrc/linux$ bash rebasecheck > ... > rebase v5.10 > ... > real 0m22.180s > user 0m17.219s > sys 0m4.701s > ... > rebase --onto v5.10 v5.4 > ... > real 0m12.341s > user 0m7.308s > sys 0m4.632s > > So -sort is quite a bit quicker, but the ~10s overhead when not using > --onto is visible there, too. Yeah, try adding --reapply-cherry-picks; I think that flag should shrink most of the difference. > When looking at the timing of the output, the 10s time difference occur > before "Rebasing (1/4)" is emitted. > > wgit rebase -sort --onto v5.10 v5.10 > > behaves like > > wgit rebase -sort v5.10 > > and if I only rebase the first two patches (instead of four) it still > takes nearly the same time. Another test I did was: > > time wgit rebase -sort --onto v5.10 v5.7 > > real 0m17.712s > user 0m11.570s > sys 0m5.396s > > So there seems to be something before the actual rebase is done that > takes longer when HEAD..$base contains more objects. > Given that > > ukl@xxxxxxxx:~/gsrc/linux$ time wgit log --oneline --cherry v5.10...0091ecb84cfdef0f4cb65810219f5ac9bb4341e5 > + 0091ecb84cfd (ptx/ukl/rebase-timing) nvmem: core: skip child nodes not matching binding > + 38af1d38c542 spidev: add "hxxxxxxx,xxxxxx" compatible > + a7edcfb6a968 regmap: fix memory leak in regmap_debugfs_init() > + b1d90bc89408 pci: add quirk for txxxxx FPGA watchdog > > real 0m10.783s > user 0m10.346s > sys 0m0.436s > > I guess this range is searched for commits that have the same patch id > as the patches to rebase? Yep, and --reapply-cherry-picks removes this cherry-searching. Try it and see how it affects your results. I don't think it'll entirely eliminate the differences for you (it didn't for me), because there appears to be some other weird overhead -- part of it from can_fast_forward() and more that I didn't track down further. I do think that the --reapply-cherry-picks will remove most of the differences for you, though. > FTR: In the above repo I have: > > ukl@xxxxxxxx:~/gsrc/linux$ wgit config merge.renameLimit > 10000 Yep, so my choice of 9999 to try to reproduce your behavior was a pretty good pick, eh? :-) Hope that helps, Elijah