Re: [RFC PATCH 2/4] sha1_file: extract type and size from object_info

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Mon, 12 Jun 2017 12:52:54 -0700

On Sat, 10 Jun 2017 03:01:33 -0400
Jeff King <peff@xxxxxxxx> wrote:

> On Fri, Jun 09, 2017 at 12:23:24PM -0700, Jonathan Tan wrote:
> 
> > Looking at the 3 primary functions (sha1_object_info_extended,
> > read_object, has_sha1_file_with_flags), they independently implement
> > mechanisms such as object replacement, retrying the packed store after
> > failing to find the object in the packed store then the loose store, and
> > being able to mark a packed object as bad and then retrying the whole
> > process. Consolidating these mechanisms would be a great help to
> > maintainability.
> > 
> > Such a consolidated function would need to handle the read_object() case
> > (which returns the object data, type, and size) and the
> > sha1_object_info_extended() case (which returns the object type, size,
> > and some additional information, not all of which can be "turned off" by
> > passing NULL in "struct object_info").
> 
> I like the idea of consolidating the logic. But I can't help but feel
> that pulling these fields out of object_info is a step backwards. The
> point of that struct is to let the caller specify which aspects of the
> object they're interested in

My issue was that there are some parts that cannot be turned off (in
particular, the object_info.u.packed part). Having said that, reading
the packed object itself should give us enough information to populate
that, so I'll take a look and see if this is possible.

> Another approach to this whole mess is to have a single function for
> acquiring a "handle" to an object, along with functions to query aspects
> of a handle. That would let callers iteratively ask for the parts they
> care about, and we could lazily fill the handle info (so information we
> pick up while servicing one property of the object gets cached and
> returned for free if the caller asks for it later).
> 
> That's a much bigger change, though it may have other benefits (e.g., we
> could be passing around handles instead of object buffers, which would
> make it more natural to stream object content in many cases).

There are a few safeguards that, I think, only work with the current
get-everything-then-forget-about-it approach (the packed-loose-packed
retry mechanism, and the desperate retry-if-corrupt-packed-object one).
If we have a handle with a cache, then, for example, we would lose the
ability to retry packed after loose if the handle has already declared
that the object is loose.