Re: [PATCH 7/6] ref-filter: use generation number for --contains

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 04, 2018 at 03:06:26PM -0400, Derrick Stolee wrote:

> > > @@ -1615,8 +1619,20 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
> > >   					      struct contains_cache *cache)
> > >   {
> > >   	struct contains_stack contains_stack = { 0, 0, NULL };
> > > -	enum contains_result result = contains_test(candidate, want, cache);
> > > +	enum contains_result result;
> > > +	uint32_t cutoff = GENERATION_NUMBER_UNDEF;
> > > +	const struct commit_list *p;
> > > +
> > > +	for (p = want; p; p = p->next) {
> > > +		struct commit *c = p->item;
> > > +		parse_commit_or_die(c);
> > > +		if (c->generation < cutoff)
> > > +			cutoff = c->generation;
> > > +	}
> 
> Now that you mention it, let me split out the portion you are probably
> talking about as incorrect:
> 
> > > +	if (cutoff == GENERATION_NUMBER_UNDEF)
> > > +		cutoff = GENERATION_NUMBER_NONE;
> 
> You're right, we don't want this. Since GENERATION_NUMBER_NONE == 0, we get
> no benefit from this. If we keep it GENERATION_NUMBER_UNDEF, then our walk
> will be limited to commits NOT in the commit-graph (which we hope is small
> if proper hygiene is followed).

I think it's more than that. If we leave it at UNDEF, that's wrong,
because contains_test() compares:

  candidate->generation < cutoff

which would _always_ be true. In other words, we're saying that our
"want" has an insanely high generation number, and traversing can never
find it. Which is clearly wrong.

So we have to put it at "0", to say "you should always traverse, we
can't tell you that this is a dead end". So that part of the logic is
currently correct.

But what I was getting at is that the loop behavior can't just pick the
min cutoff. The min is effectively "0" if there's even a single ref for
which we don't have a generation number, because we cannot ever stop
traversing (we might get to that commit if we kept going).

(It's also possible I'm confused about how UNDEF and NONE are used; I'm
assuming commits for which we don't have a generation number available
would get UNDEF in their commit->generation field).

If you could make the assumption that when we have a generation for
commit X, then we have a generation for all of its ancestors, things get
easier. Because then if you hit commit X with a generation number and
want to compare it to a cutoff, you know that either:

  1. The cutoff is defined, in which case you can stop traversing if
     we've gone past the cutoff.

  2. The cutoff is undefined, in which case we cannot possibly reach
     our "want" by traversing. Even if it has a smaller generation
     number than us, it's on an unrelated line of development.

I don't know that the reachability property is explicitly promised by
your work, but it seems like it would be a natural fallout (after all,
you have to know the generation of each ancestor in order to compute the
later ones, so you're really just promising that you've actually stored
all the ones you've computed).

> > I wonder to what degree it's worth traversing to come up with a
> > generation number for the "want" commits. If we walked, say, 50 commits
> > to do it, you'd probably save a lot of work (since the alternative is
> > walking thousands of commits until you realize that some ancient "v1.0"
> > tag is not useful).
> > 
> > I'd actually go so far as to say that any amount of traversal is
> > generally going to be worth it to come up with the correct generation
> > cutoff here. You can come up with pathological cases where you only have
> > one really recent tag or something, but in practice every repository
> > where performance is a concern is going to end up with refs much further
> > back than it would take to reach the cutoff condition.
> 
> Perhaps there is some value in walking to find the correct cutoff value, but
> it is difficult to determine how far we are from commits with correct
> generation numbers _a priori_. I'd rather rely on the commit-graph being in
> a good state, not too far behind the refs. An added complexity of computing
> generation numbers dynamically is that we would need to add a dependence on
> the commit-graph file's existence at all.

If you could make the reachability assumption, I think this question
just goes away. As soon as you hit a commit with _any_ generation
number, you could quit traversing down that path.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux