Re: [RFC PATCH 2/4] sha1_file: extract type and size from object_info

Jeff King <peff@xxxxxxxx> · Sat, 10 Jun 2017 03:01:33 -0400

On Fri, Jun 09, 2017 at 12:23:24PM -0700, Jonathan Tan wrote:

> Looking at the 3 primary functions (sha1_object_info_extended,
> read_object, has_sha1_file_with_flags), they independently implement
> mechanisms such as object replacement, retrying the packed store after
> failing to find the object in the packed store then the loose store, and
> being able to mark a packed object as bad and then retrying the whole
> process. Consolidating these mechanisms would be a great help to
> maintainability.
> 
> Such a consolidated function would need to handle the read_object() case
> (which returns the object data, type, and size) and the
> sha1_object_info_extended() case (which returns the object type, size,
> and some additional information, not all of which can be "turned off" by
> passing NULL in "struct object_info").

I like the idea of consolidating the logic. But I can't help but feel
that pulling these fields out of object_info is a step backwards. The
point of that struct is to let the caller specify which aspects of the
object they're interested in, and let the lookup function decide how
best to optimize the query.

So it seems like places which actually want to read the object should be
passing in a new field in the object_info for "yes, I actually want the
object contents, too", and then the consolidated function can decide
which approach to take based on whether or not the contents are
requested (e.g., unpacking the whole thing, or just the header).

If a caller asks for the contents but not the size, that's OK. We'd find
the size incidentally while unpacking the contents, but just not include
it in the returned object_info.

Another approach to this whole mess is to have a single function for
acquiring a "handle" to an object, along with functions to query aspects
of a handle. That would let callers iteratively ask for the parts they
care about, and we could lazily fill the handle info (so information we
pick up while servicing one property of the object gets cached and
returned for free if the caller asks for it later).

That's a much bigger change, though it may have other benefits (e.g., we
could be passing around handles instead of object buffers, which would
make it more natural to stream object content in many cases).

-Peff