In this iteration, I have added more context and measurements to the commit message. I have also made small improvements to the code suggested by reviewers. I enhanced t9801-git-p4-branch.sh to test for the functionality, namely that branches are branched off at the correct point in their parents' history. Signed-off-by: Joachim Kuebart joachim.kuebart@xxxxxxxxx cc: Joachim Kuebart joachim.kuebart@xxxxxxxxx Joachim Kuebart (2): git-p4: ensure complex branches are cloned correctly git-p4: speed up search for branch parent git-p4.py | 21 ++++++++++----------- t/t9801-git-p4-branch.sh | 2 ++ 2 files changed, 12 insertions(+), 11 deletions(-) base-commit: 311531c9de557d25ac087c1637818bd2aad6eb3a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1013%2Fjkuebart%2Fp4-faster-parent-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1013/jkuebart/p4-faster-parent-v2 Pull-Request: https://github.com/git/git/pull/1013 Range-diff vs v1: -: ------------ > 1: 0ee0b7b55691 git-p4: ensure complex branches are cloned correctly 1: a171f7e6c023 ! 2: 41b3a23f682c git-p4: speed up search for branch parent @@ Metadata ## Commit message ## git-p4: speed up search for branch parent - Previously, the code iterated through the parent branch commits and - compared each one to the target tree using diff-tree. + For every new branch that git-p4 imports, it needs to find the commit + where it branched off its parent branch. While p4 doesn't record this + information explicitly, the first changelist on a branch is usually an + identical copy of the parent branch. - This patch outputs the revision's tree hash along with the commit hash, - thereby saving the diff-tree invocation. This results in a considerable - speed-up, at least on Windows. + The method searchParent() tries to find a commit in the history of the + given "parent" branch whose tree exactly matches the initial changelist + of the new branch, "target". The code iterates through the parent + commits and compares each of them to this initial changelist using + diff-tree. + + Since we already know the tree object name we are looking for, spawning + diff-tree for each commit is wasteful. + + Use the "--format" option of "rev-list" to find out the tree object name + of each commit in the history, and find the tree whose name is exactly + the same as the tree of the target commit to optimize this. + + This results in a considerable speed-up, at least on Windows. On one + Windows machine with a fairly large repository of about 16000 commits in + the parent branch, the current code takes over 7 minutes, while the new + code only takes just over 10 seconds for the same changelist: + + Before: + + $ time git p4 sync + Importing from/into multiple branches + Depot paths: //depot + Importing revision 31274 (100.0%) + Updated branches: b1 + + real 7m41.458s + user 0m0.000s + sys 0m0.077s + + After: + + $ time git p4 sync + Importing from/into multiple branches + Depot paths: //depot + Importing revision 31274 (100.0%) + Updated branches: b1 + + real 0m10.235s + user 0m0.000s + sys 0m0.062s Signed-off-by: Joachim Kuebart <joachim.kuebart@xxxxxxxxx> + Helped-by: Junio C Hamano <gitster@xxxxxxxxx> + Helped-by: Luke Diamand <luke@xxxxxxxxxxx> ## git-p4.py ## @@ git-p4.py: def importNewBranch(self, branch, maxChange): @@ git-p4.py: def importNewBranch(self, branch, maxChange): def searchParent(self, parent, branch, target): - parentFound = False - for blob in read_pipe_lines(["git", "rev-list", "--reverse", -+ for tree in read_pipe_lines(["git", "rev-parse", -+ "{}^{{tree}}".format(target)]): -+ targetTree = tree.strip() -+ for blob in read_pipe_lines(["git", "rev-list", "--format=%H %T", ++ targetTree = read_pipe(["git", "rev-parse", ++ "{}^{{tree}}".format(target)]).strip() ++ for line in read_pipe_lines(["git", "rev-list", "--format=%H %T", "--no-merges", parent]): - blob = blob.strip() - if len(read_pipe(["git", "diff-tree", blob, target])) == 0: - parentFound = True -+ if blob[:7] == "commit ": ++ if line.startswith("commit "): + continue -+ blob = blob.strip().split(" ") -+ if blob[1] == targetTree: ++ commit, tree = line.strip().split(" ") ++ if tree == targetTree: if self.verbose: - print("Found parent of %s in commit %s" % (branch, blob)) - break @@ git-p4.py: def importNewBranch(self, branch, maxChange): - return blob - else: - return None -+ print("Found parent of %s in commit %s" % (branch, blob[0])) -+ return blob[0] ++ print("Found parent of %s in commit %s" % (branch, commit)) ++ return commit + return None def importChanges(self, changes, origin_revision=0): -- gitgitgadget