Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 31, 2021 at 08:31:16PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Ævar's patch tries to improve the case where we might _know_ which is
> > correct (because we're actually parsing the object contents), but of
> > course it covers only a fraction of cases. I'm not really opposed to
> > that per se, but I probably wouldn't bother myself.
> 
> What fraction of cases? As far as I can tell it covers all cases where
> we get this error.
> 
> If there is a case like what you're describing I haven't found it.

It would happen any time somebody calls lookup_foo() because they saw an
object referenced, but _doesn't_ parse it. And then somebody later calls
lookup_bar() in the same way. Neither of them consulted the actual
object database.

Try this with your patches:

-- >8 --
git init repo
cd repo

# just for making things deterministic
export GIT_COMMITTER_NAME='A U Thor'
export GIT_COMMITTER_EMAIL='author@xxxxxxxxxxx'
export GIT_COMMITTER_DATE='@1234567890 +0000'

blob=$(echo foo | git hash-object -w --stdin)
git tag -m 'tag of blob' tag-of-blob $blob
git update-ref refs/tags/tag-of-commit $(
  git cat-file tag tag-of-blob |
  sed s/blob/commit/g |
  git hash-object -w --stdin -t tag
)
git update-ref refs/tags/tag-of-tree $(
  git cat-file tag tag-of-blob |
  sed s/blob/tree/g |
  git hash-object -w --stdin -t tag
)

git fsck
-- >8 --

That fsck produces (257cc5642 is the blob):

  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a blob, not a commit
  error: 257cc5642cb1a054f08cc83f2d943e56fd3ebe99: object could not be parsed: .git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a tree
  error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in aaff0d42df150e1a734f6a8516878b2ea315ee0a
  error: aaff0d42df150e1a734f6a8516878b2ea315ee0a: object could not be parsed: .git/objects/aa/ff0d42df150e1a734f6a8516878b2ea315ee0a
  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a blob
  error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in bbd2b7077cd91ee6175cdc0e4c477c25c230cdc7
  error: bbd2b7077cd91ee6175cdc0e4c477c25c230cdc7: object could not be parsed: .git/objects/bb/d2b7077cd91ee6175cdc0e4c477c25c230cdc7

So we claim "is X, not Y" in multiple directions for the same object.

It might just be that there are spots in the fsck code that need to be
adjusted to use your new function (if they are indeed parsing the
referred-to object). But there are lots of places that don't actually
parse the object at the moment they're parsing the tag. E.g.:

  $ git for-each-ref --format='%(*objectname)'
  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a tree
  error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in aaff0d42df150e1a734f6a8516878b2ea315ee0a
  Segmentation fault

Neither of those types is the correct one. And the segfault is just a
bonus! :)

I'd expect similar cases with parsing commit parents and tree pointers.
And probably tree entries whose modes are wrong.

> I.e. it happens when we have an un-parsed "struct object" whose type is
> inferred, and parse it to find out it's not what we expected.
> 
> It's not ambigious at all what the object actually is. It's just that
> the previous code was leaking the *assumption* about the type at the
> time of emitting the error, due to an apparent oversight with parsed
> v.s. non-parsed.
> 
> Or in other words, we're leaking the implementation detail that we
> pre-allocated an object struct of a given type in anticipation of
> holding a parsed version of that object soon.

Right. In the case that you are indeed parsing the object later, you can
say definitively "it is X in the odb, but seen as Y previously". But we
do not always hit the "is X, not Y" error when parsing the object. It
might be caused by two of these "pre-allocations" (though really I think
it is not just an implementation detail; the pre-allocation happened
because some other object referred to us as a given type, so it really
is a corruption in the repository. Just not in the object we mention).

> > @@ -169,10 +169,16 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
> >  		return obj;
> >  	}
> >  	else {
> > -		if (!quiet)
> > -			error(_("object %s is a %s, not a %s"),
> > -			      oid_to_hex(&obj->oid),
> > -			      type_name(obj->type), type_name(type));
> > +		if (!(flags & OBJECT_AS_TYPE_QUIET)) {
> > +			if (flags & OBJECT_AS_TYPE_EXPECT_PARSED)
> > +				error(_("object %s is a %s, but was referred to as a %s"),
> > +				      oid_to_hex(&obj->oid), type_name(obj->type),
> > +				      type_name(type));
> > +			else
> > +				error(_("object %s referred to as both a %s and a %s"),
> > +				      oid_to_hex(&obj->oid),
> > +				      type_name(obj->type), type_name(type));
> > +		}
> >  		return NULL;
> >  	}
> >  }
> 
> Per the above I don't understand how you think there's any uncertainty
> here.
> 
> If I'm right and there isn't then first of all I don't see how we could
> emit 1/2 of those errors. The whole problem here is that we don't know
> the type of the un-parsed object (and presumably don't want to eagerly
> know, it would mean hitting the object store).

Forgetting for a moment how to trigger it with actual Git commands, the
root of the problem is that:

  lookup_tree(&oid);
  lookup_blob(&oid);

is going to produce an error message. But we cannot know which object
type is wrong and which is right (if any). So we'd want to produce the
"referred to as both" message.

_If_ the caller happens to know that it has just parsed the object
contents and got a tree, then it would call lookup_parsed_tree(&oid),
which would pass along OBJECT_AS_TYPE_EXPECT_PARSED, and produce the
other message.

In practice, of course those two lookup_foo() calls are not right next
to each other. But they may be triggered on an identical oid by two
references from different objects.

> But when we do know why would we beat around the bush and say "was
> referred to as X and Y" once we know what it is. 
> 
> AFAICT there's no more reason to think that parse_object_buffer() will
> be wrong about the type than "git cat-file -t" will be. They both use
> the same underlying functions to get that information.

My point is that we are not always coming from parse_object_buffer()
when we see these error messages.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux