Junio C Hamano <gitster@xxxxxxxxx> writes: >> Another difference with "write_loose_object()" is that we have no chance >> to run "write_object_file_prepare()" to calculate the oid in advance. > > That is somewhat curious. Is it fundamentally impossible, or is it > just that this patch was written in such a way that conflates the > two and it is cumbersome to split the "we repeat the sequence of > reading and deflating just a bit until we process all" and the "we > compute the hash over the data first and then we write out for > real"? OK, the answer lies somewhere in between. The initial user of this streaming interface reads from an incoming packfile and feeds the inflated bytestream to the interface, which means we cannot seek. That meaks it "fundamentally impossible" for that codepath (i.e. unpack-objects to read from packstream and write to on-disk loose objects). But if the input source is seekable (e.g. a file in the working tree), there is no fundamental reason why the new interface has "no chance to run prepare to calculate the oid in advance". It's just that the such a different caller is not added by the series and we chose not to allow the "prepare and then write" two-step process, because we currently do not need it when this series lands. > I am very tempted to ask why we do not do this to _all_ loose object > files. Instead of running the machinery twice over the data (once to > compute the object name, then to compute the contents and write out), > if we can produce loose object files of any size with a single pass, > wouldn't that be an overall win? There is a patch later in the series whose proposed log message has benchmarks to show that it is slower in general. It still is curious where the slowness comes from and if it is something we can tune, though. Thanks.