On Fri, Jul 02, 2010 at 03:26:12PM -0400, tytso@xxxxxxx wrote: > I just tried your patch, and with a large number of tags (198 tags, > from v2.6.11 to v2.6.34 with all of the -rc releases of the linux > kernel), it is indeed faster: 8.5 seconds without the patch versus 2.3 > seconds with the patch. > > However, if I remove a large number of tags (since I know this is > something that was introduced since 2.6.33, so I made a shared clone > of the repository but then I removed all of the tags from 2.6.11 > through 2.6.33, so there was only 19 tags in play), the time to > execute the git tag --contains became 1.3 seconds without the patch, > versus 2.9 seconds without the patch. > > So with the oldest tags removed, your patch actually made things run > *slower* (2.3 vs 2.9 seconds, which was counter-intuitive to me), and > fastest way to speed things up was to restrict the tags that would be > searched. I'm guessing that it is caused by the depth-first search that my patch does. If we follow the "wrong" parent of a merge (i.e., the one that the commit in question is not on), then we will end up hitting all commits down to the roots before backtracking and looking down the right parent. I noticed that my improved time for "git tag --contains" was similar to the total time for "git rev-list --all >/dev/null". Can you try timing that? My suspicion is that it is going to be about 2.9 seconds for you. So we could potentially improve my patch by doing a breadth-first search, although that is a bit trickier (since the point is to mark and prune whole subgraphs of history). But I'm not sure it is worth it in practice. It will make some lookups quicker, but in most cases you will end up going to the roots anyway for a negative lookup (in your case it was only faster because you got rid of all the old tags). A better strategy is to prune based on commit date so we _never_ go to the roots, even for a negative hit. That should give similar speedups as a breadth-first search would to your case, but also make the normal case much faster by quickly discarding nonsensical paths (e.g., there is not much point following a commit made 3 years ago to see if it contains a commit made yesterday). Try the patch below, which adds a date cutoff similar to the one used in name-rev. It's much faster in my tests: # stock "git tag --contains HEAD~200" real 0m2.179s user 0m2.156s sys 0m0.020s # my first patch real 0m0.621s user 0m0.588s sys 0m0.032s # this patch real 0m0.041s user 0m0.036s sys 0m0.004s For non-pathological cases, I expect it to perform equal to stock git at worst, and in practice much better (for pathological cases, its worst case is equivalent to "git rev-list --all", which is still not that bad). Let me know how it does on your case. The obvious downside is that it stops looking down a path in the face of extreme clock skew. We could perhaps allow a "--contains=thorough" to spend a little more time to achieve a better answer (i.e., ignore the cutoff date). diff --git a/builtin/tag.c b/builtin/tag.c index d311491..5768a44 100644 --- a/builtin/tag.c +++ b/builtin/tag.c @@ -10,6 +10,8 @@ #include "builtin.h" #include "refs.h" #include "tag.h" +#include "diff.h" +#include "revision.h" #include "run-command.h" #include "parse-options.h" @@ -31,6 +33,61 @@ struct tag_filter { #define PGP_SIGNATURE "-----BEGIN PGP SIGNATURE-----" +static int in_commit_list(const struct commit_list *want, struct commit *c) +{ + for (; want; want = want->next) + if (!hashcmp(want->item->object.sha1, c->object.sha1)) + return 1; + return 0; +} + +static int contains_recurse(const struct commit_list *want, + struct commit *candidate, + unsigned long cutoff) +{ + struct commit_list *p; + + /* was it previously marked as containing a want commit? */ + if (candidate->object.flags & TMP_MARK) + return 1; + /* or marked as not possibly containing a want commit? */ + if (candidate->object.flags & UNINTERESTING) + return 0; + /* or are we it? */ + if (in_commit_list(want, candidate)) + return 1; + + /* stop searching if we go too far back in time */ + parse_commit(candidate); + if (candidate->date < cutoff) + return 0; + + /* If not, then try parents, and be sure to mark ourselves + * for future traversals. */ + for (p = candidate->parents; p; p = p->next) { + if (contains_recurse(want, p->item, cutoff)) { + candidate->object.flags |= TMP_MARK; + return 1; + } + } + + candidate->object.flags |= UNINTERESTING; + return 0; +} + +static int contains(const struct commit_list *want, struct commit *candidate) +{ + const struct commit_list *c; + unsigned long min_date = ULONG_MAX; + for (c = want; c; c = c->next) { + parse_commit(c->item); + if (c->item->date < min_date) + min_date = c->item->date; + } + /* give a full day of clock skew slop */ + return contains_recurse(want, candidate, min_date - 86400); +} + static int show_reference(const char *refname, const unsigned char *sha1, int flag, void *cb_data) { @@ -49,7 +106,7 @@ static int show_reference(const char *refname, const unsigned char *sha1, commit = lookup_commit_reference_gently(sha1, 1); if (!commit) return 0; - if (!is_descendant_of(commit, filter->with_commit)) + if (!contains(filter->with_commit, commit)) return 0; } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html