"Joachim Kuebart via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Joachim Kuebart <joachim.kuebart@xxxxxxxxx> Thanks. As git-p4 is not in my area of expertise, I'll make a style critique first, while pinging Luke as an area expert (you can learn who they are with "git shortlog --no-merges --since=18.months.ago git-p4.py"). > Previously, the code iterated through the parent branch commits and > compared each one to the target tree using diff-tree. It is customary in this project to describe the problem in the present tense. In other words, whoever is writing the log message still lives in the world without this patch applied to the system. The code iterates through the parent commits and compares each of them to the target tree using diff-tree. But before that sentence, please prepare the reader with a bit larger picture. A reader may not know what purpose the comparison serves. Do we know that the tree of one of the parents of the commit must match the tree of the target, and trying to see which parent is the one with the same tree? What is helped by learning which parent has the same tree? Perhaps The method searchParent() is used to find a commit in the history of the given 'parent' commit whose tree exactly matches the tree of the given 'target commit. The code iterates through the commits in the history and compares each of them to the target tree by invoking diff-tree. And then our log message would make observation, pointing out what is wrong with it (after all comparing with diff-tree is not giving us a wrong result---the point of this change is that spawning diff-tree for each commit is wasteful when we only want to see exact matches). Because we only are interested in finding a tree that is exactly the same, and not interested in how other trees are different, having to spawn diff-tree for each and every commit is wasteful. > This patch outputs the revision's tree hash along with the commit hash, > thereby saving the diff-tree invocation. This results in a considerable > speed-up, at least on Windows. And then our log message would order the codebase to "become like so", in order to resolve the issue(s) pointed out in the observation. Perhaps Use the "--format" option of "rev-list" to find out the tree object name of each commit in the history, and find the tree whose name is exactly the same as the tree of the target commit to optimize this. When making a claim on performance, it is helpful to our readers to give some numbers, even in a limited test, e.g. In a sample history where ~100 commits needed to be traversed to find the fork point on my Windows box, the current code took 10.4 seconds to complete, while the new code yields the same result in 1.8 seconds, which is a significant speed-up. or something along these lines. > Signed-off-by: Joachim Kuebart <joachim.kuebart@xxxxxxxxx> > git-p4.py | 22 +++++++++++----------- > 1 file changed, 11 insertions(+), 11 deletions(-) > > diff --git a/git-p4.py b/git-p4.py > index 09c9e93ac401..dbe94e6fb83b 100755 > --- a/git-p4.py > +++ b/git-p4.py > @@ -3600,19 +3600,19 @@ def importNewBranch(self, branch, maxChange): > return True > > def searchParent(self, parent, branch, target): > + for tree in read_pipe_lines(["git", "rev-parse", > + "{}^{{tree}}".format(target)]): > + targetTree = tree.strip() It looks very strange to run a commit that you expect a single line of output, and read the result in a loop. Doesn't git-p4.py supply a more suitable helper to read a single line output from a command? > + for blob in read_pipe_lines(["git", "rev-list", "--format=%H %T", > "--no-merges", parent]): This is not a new problem you introduced, but when we hear word "blob" in the context of this project, it reminds us of the "blob" object, while the 'blob' variable used in this loop has nothing to do with it. Perhaps rename it to say 'line' or something? > + if blob[:7] == "commit ": > + continue Perhaps blob.startswith("commit ") to avoid hardcoded string length? > + blob = blob.strip().split(" ") > + if blob[1] == targetTree: > if self.verbose: > + print("Found parent of %s in commit %s" % (branch, blob[0])) > + return blob[0] > + return None