Re: [PATCH] open_sha1_file: report "most interesting" errno

Jeff King <peff@xxxxxxxx> · Thu, 15 May 2014 15:11:28 -0400

On Thu, May 15, 2014 at 10:02:06AM -0700, Junio C Hamano wrote:

> >     $ chmod 0 .git/objects/??/*
> >     $ git rev-list --all
> >     fatal: loose object b2d6fab18b92d49eac46dc3c5a0bcafabda20131 (stored in .git/objects/b2/d6fab18b92d49eac46dc3c5a0bcafabda20131) is corrupt
> 
> Hmmmmmmmm.  So we keep track of a more interesting errno we get from
> some other place than what we get for this local loose object, and
> we no longer give this message pointing at the local loose
> object---is that the idea?

Yes, though my main goal was to stop saying "corrupt" when that is not
the problem at all. Not pointing to the wrong object was a secondary
consideration. :)

I would also be happy to just show the error for the local object, even
if it is exists somewhere else.  The main thing I am changing here is
that we currently _never_ show the errno from the main odb. We either
show the errno from the last alternate we looked at, or we show ENOENT
(because we explicitly set ENOENT right before looking at the
alternates).

I think it's a separate problem that the "stored in..." is sometimes
wrong. That comes when we get ENOENT, and we check has_loose_object().
IOW, we guess "we couldn't find it, but we claim to have it, so it must
have been corrupt". But that does not say _where_ we found it, and our
call to sha1_file_name is a guess that may be wrong.

I'm actually not sure if we can even trigger that code path now. It
depended on returning ENOENT from read_object, which we used to
frequently do erroneously. Now we will only do it when the object truly
does not exist, which means has_loose_object should generally not return
true.

I'm also a bit surprised that errno actually survives here. That clearly
was the intent, so I don't think my patch is making anything worse. But
it's possible that we would prepare_packed_git or open/mmap packfiles
between the call to open_sha1_file and when read_sha1_file_extended
looks at errno.

> What I am wondering is that this report we give in the new code
> 
> >     $ git rev-list --all
> >     fatal: failed to read object b2d6fab18b92d49eac46dc3c5a0bcafabda20131: Permission denied
> 
> may want to say which of the various possible places we saw this
> most interesting errno, which I think was the original motivation
> came from e8b15e61 (sha1_file: Show the the type and path to corrupt
> objects, 2010-06-10) that added "(stored in ...)".
> 
> But that may involve a larger surgery, and I definitely do not want
> to add unnecessary logic in the common-case codepath to keep track
> of pieces of information that are only used in the error codepath,
> so it smells like that this is the best fix to the issue the commit
> message describes.

Yes, I think doing this right would involve a lot more surgery, and I
don't know if it is worth the effort. But in addition to the problems
above, I note that we simply open the first object we can find, and do
not loop if mmap or checksums fail. So unlike packed objects, which are
resilient to corruption, we would fail immediately.

So I think the right thing to do would be:

  1. Don't loop across alternates in open_sha1_file. Loop in read_object
     (which means looping in _other_ calls to map_sha1_file, like in
     sha1_object_info).

  2. Fail quickly, since the common case is that we will find the object
     elsewhere. But when we do have an error, take time to go back and
     actually find the location of the object and the real error (i.e.,
     have a diagnose_object or something).

Neither is a particularly high priority to me, though, so I don't plan
on working on them anytime soon. The only reason I went this far was
that I saw the "loose object is corrupt" / EPERM confusion in the real
world.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html