Jeff King <peff@xxxxxxxx> writes: > The vast majority of blobs in git.git will be stored as packed deltas. > That means the streaming code will fall back to doing the regular > in-core access. We _could_ therefore use that in-core copy to do our > sha1 check rather than streaming; but of course we never get access to > it outside of stream_blob_to_fd, and it is discarded. However, we do > keep a copy in the delta base cache. When we immediately ask to unpack > the exact same entry for check_sha1_signature, we can pull the copy > straight out of the cache without having to re-inflate the object. OK, that explains the overhead of 20% that is lower than one would naïvely expect. Thanks. > Yes, I think it is a reasonable addition to the streaming API. However, > I do not think there are any callsites which would currently want it. > All of the current users of stream_blob_to_fd use read_sha1_file as > their alternative, and not parse_object. So we are not verifying the > sha1 in either case (we may want to change that, of course, but that is > a bigger decision than just trying to bring streaming and non-streaming > code-paths into parity). True. I am not offhand sure if we want to make read_sha1_file() to rehash, but I agree that it is a question different from what we are asking in this discussion. > I also wondered if parse_object itself had problems with double-reading > or failing to verify. But its use goes the opposite direction; it wants > to verify the sha1 of the blob object, but it knows that it does not > actually need the data. So it streams (as of 090ea12) to check the > signature, but then discards each buffer-full after hashing it. > > -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html