Marat Radchenko venit, vidit, dixit 13.07.2010 08:56: > Hi. > > My setup: > 0. Quad-code machine with 8GB of ram, 10K RPM hdd. > 1. SVN repo that i periodically fetch into origin/trunk branch. Has ~200 > commits/day. > 2. My local branch with 1-5 commits which i often rebase against trunk. > 3. I haven't rebased for 2 days, so i'm rebasing 3 (three) commits in my branch > over 453 commits in trunk using "git rebase trunk". > 4. trunk does contain "bad" from diff POV files (big & binary). > 5. Sadly, data in repo is confidential. > > Expected: rebase takes some reasonable amount of time (< 1 min?). > > Actual: rebase takes 20 mins. > > Almost all of that time was spent doing `git format-patch -k --stdout --full- > index --ignore-if-in-upstream > 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf > 52` (that's three commits from my branch) at 100% of one CPU core. > > Additional info: > > Another similar rebase but over 4.5k of commits took 2 hours. > > Running without --ignore-if-in-upstream: > $ time git format-patch -k --stdout --full-index > 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf > 5 | wc -l > 25823 > Is it > real 0m0.163s > user 0m0.140s > sys 0m0.020s > > Proof there are only three commits: > > $ git rev-list > 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf > 52d3fde4ae7497981a6fe61b0366b105477896cf52 > e18069258806bda6a6165822003f5e9fd958f906 > c8c2f2e157e615b73d0baab1d793a22991c9ba71 > > Questions: > 1. Is it expected behavior (branch you rebase onto has binary files -> no > performance for you)? Well, with "ignore-if-in-upstream" git has to compute a patch-id for every upstream patch (merge-base..upstream) and compare to the ids of the commits in mb..HEAD. > 2. If [1] is yes, is it possible to prevent rebase from running --ignore-if-in- > upstream? Not currently, but with my upcoming patch ;) This has the (side-) effect of not ignoring patches which have been applied (with different sha1) upstream, of course. > 3. If [1] is no, should i run some kind of profiler (how?) to determine what > exactly causes such performance drop? It is the calculation of the patch-ids. Git first creates a "binary diff" and then computes the patch-id (sha1) of that diff. I am sure we could optimize the calculation of patch-ids for binary diffs, which may be useful in addition to shutting off "cherry" with rebase. Michael -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html