On 4/4/2018 2:22 PM, Jeff King wrote:
On Wed, Apr 04, 2018 at 11:45:53AM -0400, Derrick Stolee wrote:
@@ -1615,8 +1619,20 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
struct contains_cache *cache)
{
struct contains_stack contains_stack = { 0, 0, NULL };
- enum contains_result result = contains_test(candidate, want, cache);
+ enum contains_result result;
+ uint32_t cutoff = GENERATION_NUMBER_UNDEF;
+ const struct commit_list *p;
+
+ for (p = want; p; p = p->next) {
+ struct commit *c = p->item;
+ parse_commit_or_die(c);
+ if (c->generation < cutoff)
+ cutoff = c->generation;
+ }
Now that you mention it, let me split out the portion you are probably
talking about as incorrect:
+ if (cutoff == GENERATION_NUMBER_UNDEF)
+ cutoff = GENERATION_NUMBER_NONE;
You're right, we don't want this. Since GENERATION_NUMBER_NONE == 0, we
get no benefit from this. If we keep it GENERATION_NUMBER_UNDEF, then
our walk will be limited to commits NOT in the commit-graph (which we
hope is small if proper hygiene is followed).
Hmm, on reflection, I'm not sure if this is right in the face of
multiple "want" commits, only some of which have generation numbers. We
probably want to disable the cutoff if _any_ "want" commit doesn't have
a number.
There's also an obvious corner case where this won't kick in, and you'd
really like it to: recently added commits. E.g,. if I do this:
git gc ;# imagine this writes generation numbers
git pull
git tag --contains HEAD
then HEAD isn't going to have a generation number. But this is the case
where we have the most to gain, since we could throw away all of the
ancient tags immediately upon seeing that their generation numbers are
way less than that of HEAD.
I wonder to what degree it's worth traversing to come up with a
generation number for the "want" commits. If we walked, say, 50 commits
to do it, you'd probably save a lot of work (since the alternative is
walking thousands of commits until you realize that some ancient "v1.0"
tag is not useful).
I'd actually go so far as to say that any amount of traversal is
generally going to be worth it to come up with the correct generation
cutoff here. You can come up with pathological cases where you only have
one really recent tag or something, but in practice every repository
where performance is a concern is going to end up with refs much further
back than it would take to reach the cutoff condition.
Perhaps there is some value in walking to find the correct cutoff value,
but it is difficult to determine how far we are from commits with
correct generation numbers _a priori_. I'd rather rely on the
commit-graph being in a good state, not too far behind the refs. An
added complexity of computing generation numbers dynamically is that we
would need to add a dependence on the commit-graph file's existence at all.
Thanks,
-Stolee