Re: [PATCH 7/6] ref-filter: use generation number for --contains

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/4/2018 2:22 PM, Jeff King wrote:
On Wed, Apr 04, 2018 at 11:45:53AM -0400, Derrick Stolee wrote:

@@ -1615,8 +1619,20 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
  					      struct contains_cache *cache)
  {
  	struct contains_stack contains_stack = { 0, 0, NULL };
-	enum contains_result result = contains_test(candidate, want, cache);
+	enum contains_result result;
+	uint32_t cutoff = GENERATION_NUMBER_UNDEF;
+	const struct commit_list *p;
+
+	for (p = want; p; p = p->next) {
+		struct commit *c = p->item;
+		parse_commit_or_die(c);
+		if (c->generation < cutoff)
+			cutoff = c->generation;
+	}

Now that you mention it, let me split out the portion you are probably talking about as incorrect:

+	if (cutoff == GENERATION_NUMBER_UNDEF)
+		cutoff = GENERATION_NUMBER_NONE;

You're right, we don't want this. Since GENERATION_NUMBER_NONE == 0, we get no benefit from this. If we keep it GENERATION_NUMBER_UNDEF, then our walk will be limited to commits NOT in the commit-graph (which we hope is small if proper hygiene is followed).

Hmm, on reflection, I'm not sure if this is right in the face of
multiple "want" commits, only some of which have generation numbers.  We
probably want to disable the cutoff if _any_ "want" commit doesn't have
a number.

There's also an obvious corner case where this won't kick in, and you'd
really like it to: recently added commits. E.g,. if I do this:

   git gc ;# imagine this writes generation numbers
   git pull
   git tag --contains HEAD

then HEAD isn't going to have a generation number. But this is the case
where we have the most to gain, since we could throw away all of the
ancient tags immediately upon seeing that their generation numbers are
way less than that of HEAD.

I wonder to what degree it's worth traversing to come up with a
generation number for the "want" commits. If we walked, say, 50 commits
to do it, you'd probably save a lot of work (since the alternative is
walking thousands of commits until you realize that some ancient "v1.0"
tag is not useful).

I'd actually go so far as to say that any amount of traversal is
generally going to be worth it to come up with the correct generation
cutoff here. You can come up with pathological cases where you only have
one really recent tag or something, but in practice every repository
where performance is a concern is going to end up with refs much further
back than it would take to reach the cutoff condition.

Perhaps there is some value in walking to find the correct cutoff value, but it is difficult to determine how far we are from commits with correct generation numbers _a priori_. I'd rather rely on the commit-graph being in a good state, not too far behind the refs. An added complexity of computing generation numbers dynamically is that we would need to add a dependence on the commit-graph file's existence at all.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux