Re: [PATCH v3 4/4] revision: avoid hitting packfiles when commits are in commit-graph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 02, 2021 at 01:01:03PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
[snip]
> > +int find_object_in_graph(struct repository *repo, struct object *object)
> > +{
> > +	struct commit *commit;
> > +	uint32_t pos;
> > +
> > +	if (object->parsed) {
> > +		if (object->type != OBJ_COMMIT)
> > +			return -1;
> > +		return 0;
> 
> This is puzzling---at least it is not consistent with what the
> function name says ("please say if you find _this_ object in the
> commit-graph file"---if that is not what this function does, it
> needs a comment before the implementation).
> 
> The caller had object and we has already been parsed.  If the
> function were "with help from commit-graph, please tell me if you
> can positively say this is a commit", the above is understandable.
> If we know positively that it is not commit, we say "no, it is not a
> commit" (which may be suboptimal---if the caller falls back to
> another codepath, the object will still not be a commit) and if we
> know it is a commit, we can say "yes, it definitely is a commit" and
> the caller can stop there.
> 
> I guess my only problem with this function is that its name and what
> it does does not align.  If the caller uses it for the real purpose
> of the function I guessed, then the logic itself may be OK.

Fair point. The only caller for now only calls the function if the
object's type is unknown, so it really is "Resolve the commit if it is
one". I'll adjust the function's name.

> >  static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)
> >  {
> >  	uint32_t graph_pos = commit_graph_position(item);
> > @@ -871,18 +913,7 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin
> >  		*pos = graph_pos;
> >  		return 1;
> >  	} else {
> > -		struct commit_graph *cur_g = g;
> > -		uint32_t lex_index;
> > -
> > -		while (cur_g && !bsearch_graph(cur_g, &(item->object.oid), &lex_index))
> > -			cur_g = cur_g->base_graph;
> > -
> > -		if (cur_g) {
> > -			*pos = lex_index + cur_g->num_commits_in_base;
> > -			return 1;
> > -		}
> > -
> > -		return 0;
> > +		return find_object_id_in_graph(&item->object.oid, g, pos);
> 
> And I think this one is a op-op refactoring that does not change the
> behaviour of find_commit_in_graph()?  It might be easier if done in
> a separate preparatory step, but it is small enough.

Will do.

One thing that occurred to me this morning after waking up is that this
commit changes semantics: if we're able to look up the commit via the
commit-graph, then we'll happily consider it to exist in the repository.
But given that we don't hit the object database at all anymore, it may
be that the commit-graph was out of date while the commit got
unreachable and thus pruned. So it may not even exist anymore in the
repository.

I wonder what our stance on this is. I can definitely understand the
angle that this would be a deal breaker given that we now claim commits
exist which don't anymore. On the other hand, we update commit-graphs
via git-gc(1), which makes this scenario a lot less likely nowadays. Is
there any precedent in our codebase where we treat commits part of the
commit-graph as existing? If not, do we want to make that assumption?

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux