Re: [PATCH] drop support for "experimental" loose objects

Jeff King <peff@xxxxxxxx> · Wed, 27 Nov 2013 14:03:19 -0500

On Wed, Nov 27, 2013 at 10:57:14AM -0800, Junio C Hamano wrote:

> > Yes, I think it is a reasonable addition to the streaming API. However,
> > I do not think there are any callsites which would currently want it.
> > All of the current users of stream_blob_to_fd use read_sha1_file as
> > their alternative, and not parse_object. So we are not verifying the
> > sha1 in either case (we may want to change that, of course, but that is
> > a bigger decision than just trying to bring streaming and non-streaming
> > code-paths into parity).
> 
> True. I am not offhand sure if we want to make read_sha1_file() to
> rehash, but I agree that it is a question different from what we are
> asking in this discussion.

I'm torn on that. Having git verify everything all the time is kind of
cool. But it _does_ have a performance impact, and the vast majority of
the time nothing got corrupted since the last time we looked at the
object. It seems like periodically running "git fsck" is a smarter way
of doing periodic checks.

We already are careful when objects are coming into the repository, and
I think that is a sensible boundary (and I am increasingly of the
opinion that running with transfer.fsckobjects off is not a good idea).

The checks in parse_object seem hack-ish to me, because they catch some
random subset of the times we access objects (e.g., calling parse_object
on a commit sha1 will check, but calling parse_commit on an unparsed
commit struct will not). If anything, I'd suggest moving the checking
down to read_sha1_file, which would add it fairly consistently
everywhere, and then tying it to a config option (off for high
performance, on for slower-but-meticulous).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html