On 04/30/2017 08:57 PM, Junio C Hamano wrote:
One thing I wonder is what the performance impact of a change like this to the codepath that wants to see if an object does _not_ exist in the repository. When creating a new object by hashing raw data, we see if an object with the same name already exists before writing the compressed loose object out (or comparing the payload to detect hash collision). With a "missing blob" support, we'd essentially spawn an extra process every time we want to create a new blob locally, and most of the time that is done only to hear the external command to say "no, we've never heard of such an object", with a possibly large latency. If we do not have to worry about that (or if it is no use to worry about it, because we cannot avoid it if we wanted to do the lazy loading of objects from elsewhere), then the patch presented here looked like a sensible first step towards the stated goal. Thanks.
Thanks for your comments. If you're referring to the codepath involving write_sha1_file() (for example, builtin/hash-object -> index_fd or builtin/unpack-objects), that is fine because write_sha1_file() invokes freshen_packed_object() and freshen_loose_object() directly to check if the object already exists (and thus does not invoke the new mechanism in this patch).
Having said that, looking at other parts of the fetching mechanism, there are a few calls to has_sha1_file() and others that might need to be checked. (We have already discussed one - the one in rev-list when invoked to check connectivity.) I could take a look at that, but was hoping for discussion on what I've sent so far (so that I know that I'm on the right track, and because it somewhat works, albeit slowly).