Re: [PATCH v2] load_ref_decorations(): fix decoration with tags

Jeff King <peff@xxxxxxxx> · Tue, 13 Jul 2021 17:27:21 -0400

On Tue, Jul 13, 2021 at 02:17:53PM -0700, Junio C Hamano wrote:

> >> Note how this commit could have been done as an optimization before
> >> 88473c8bae: When our peeling hits a non-tag, we won't parse that tagged
> >> object only to immediately end the loop.
> >
> > Yep, thanks for mentioning this, as it's somewhat subtle.
> 
> It is too subtle that I am not sure what the paragraph wants to say.
> 
> Before 88473c8b, we had a fully parsed object in obj and entered the
> while() loop iff the outermost object is a tag, then we find the
> underlying object via obj->tagged.  We parse that underlying object
> to find if it is a tag, and break out if it is not.
> 
> By "this commit", I assume that the above mean the change in this
> fix, i.e. parse 'obj' if it has not been parsed before looking at
> its tagged field.  But I am not sure how that would have been an
> optimization before 88473c8b that gave a parsed tag object 'obj'
> upon entry to the loop.
> 
> Puzzled.

The optimization is that we are parsing tags before looking at their
structs, instead of always parsing the thing that the tag points to.

So in the old loop (pseudo-code for clarity):

  parse_object(obj);
  while (obj->type == OBJ_TAG) {
          obj = obj->tagged;
	  parse_object(obj);
  }

if we had a tag pointing to a commit, we'd parse the commit. But we
don't need to. We just need to know that it exists and is a commit.

In the new code, we parse only when we need to look at obj->tagged:

  while (obj->type == OBJ_TAG) {
          parse_object(obj);
	  obj = obj->tagged;
  }

So we must "somehow" know the type of "obj" in the first place, as well
as the type of every obj->tagged we look at. And that leads into your
question here:

> In any case, let's talk about this patch in the context to which it
> is designed to be applied, i.e. post 88473c8b3c8b.
> 
> When we come here, we have done oid_object_info() plus
> lookup_object_by_type() to obtain 'obj' and we know its type.
> Then we enter the loop.
> 
>  	while (obj->type == OBJ_TAG) {
> +		if (!obj->parsed)
> +			parse_object(the_repository, &obj->oid);
> 
> And we parse if it hasn't been parsed.  THat is why we can ...
> 
>  		obj = ((struct tag *)obj)->tagged;
>  		if (!obj)
>  			break;
> 
> ... look at its tagged member.
> 
> -		if (!obj->parsed)
> -			parse_object(the_repository, &obj->oid);
>  		add_name_decoration(DECORATION_REF_TAG, refname, obj);
> 
> And the updated 'obj' (i.e. direct referent of the tag object) is
> fed to add_name_decoration().  And then we move to the next
> iteration.
> 
> Now, do we know what type of object 'obj' is at this point?  We
> did parse the outermost tag upon entry to this loop, we replaced
> 'obj' variable with the referent by following the .tagged member,
> but we haven't parsed that object or ran oid_object_info() on it.
> 
> Puzzled.

...and the answer is that we don't need to parse it. The tag object
mentions the type of what it points to, and we use lookup_commit(), etc,
to create the object pointed to by its "tagged" field. If we can't do
that (say, because the tag is missing the type field, or we previously
saw the object as a different type, etc), then obj->tagged would be NULL
and we'd break out of the loop.

-Peff