On Mon, Apr 4, 2011 at 7:22 PM, Dan McGee <dpmcgee@xxxxxxxxx> wrote: > On Sun, Apr 3, 2011 at 1:55 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> Dan McGee <dpmcgee@xxxxxxxxx> writes: >> >>> In the case of a wide breadth top-level tree (~2400 entries, all trees >>> in this case), we can see a noticeable cost in the profiler calling >>> strncmp() here. Most of the time we are at the base level of the >>> repository, so base is "" and baselen == 0, which means we will always >>> test true. Break out this one tiny case so we can short circuit the >>> strncmp() call. >> >> This sounds as if the patch helps only when you have a superfat tree at >> the "top-level" of the project, but wouldn't this benefit any superfat >> tree at _any_ level while we recursively descend into it? > > Correct. I looked at the fact that more often than not, we wouldn't > have to descend into subtrees unless searching for a path underneath > it, so that is why I phrased it that way. So the "in the case of" was > quite literally the case I was testing, but didn't mean to exclude > other potential test cases. > >>> This resulted in an ~11% improvement (43 to 38 secs) for a reasonable >>> log operation on the Arch Linux Packages SVN clone repository, which >>> contained 117220 commits and the aforementioned 2400 top-level objects: >>> git log -- autogen/trunk pacman/trunk/ wget/trunk/ >>> >>> Negligible slowdown was noted with other repositories (e.g. linux-2.6). >> >> It would have been easier to swallow if the last sentence were "This could >> lead to a slowdown in repositories without directories that are too wide, >> but in practice it was not even measurable." "Negligible" sounds as if it >> had still measurable downside, and as if you decided that the slowdown can >> be ignored---but obviously you are not an unbiased judge. > > Perhaps I was too cautious with my words- but I was also trying to not > be biased. Considering this same operation takes < 1 second in > linux-2.6, I only wanted to mention it could have a slight effect. In > reality I saw nothing more than an extra 0.01s or so, and definitely > nothing significant. Let me know if you see otherwise. > > dmcgee@galway ~/projects/linux-2.6 (master) > $ time ../git/git-log -- zzzzz_not_exist > /dev/null > > real 0m0.945s > user 0m0.857s > sys 0m0.083s > >> There is nothing wrong in the patch per-se, but I really wish we didn't >> have to do this; it feels like the compiler should be helping us in this >> case. If I resurrect this with an updated commit message reflecting concerns raised, can it be merged? Given that it is a noticeable performance boost on real-life repositories and I can show it has little (<1%) to no impact on most repos, it is a definite win. >>> Signed-off-by: Dan McGee <dpmcgee@xxxxxxxxx> >>> --- >>> tree-walk.c | 4 ++-- >>> 1 files changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/tree-walk.c b/tree-walk.c >>> index 9be8007..f386151 100644 >>> --- a/tree-walk.c >>> +++ b/tree-walk.c >>> @@ -591,8 +591,8 @@ int tree_entry_interesting(const struct name_entry *entry, >>> ps->max_depth); >>> } >>> >>> - /* Does the base match? */ >>> - if (!strncmp(base_str, match, baselen)) { >>> + /* Either there must be no base, or the base must match. */ >>> + if (baselen == 0 || !strncmp(base_str, match, baselen)) { >>> if (match_entry(entry, pathlen, >>> match + baselen, matchlen - baselen, >>> &never_interesting)) >> > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html