Junio C Hamano <gitster@xxxxxxxxx> wrote: > Eric Wong <e@xxxxxxxxx> writes: > > From: Jeff King <peff@xxxxxxxx> > > > > Avoid unnecessary round trips to the object store to speed > > up cat-file contents retrievals. The majority of packed objects > > don't benefit from the streaming interface at all and we end up > > having to load them in core anyways to satisfy our streaming > > API. > > What I found missing from the description is something like ... > > The new trick used is to teach oid_object_info_extended() that a > non-NULL oi->contentp that means "grab the contents of the objects > here" can be told to refrain from grabbing an object that is too > large. OK. > > diff --git a/object-file.c b/object-file.c > > index 065103be3e..1cc29c3c58 100644 > > --- a/object-file.c > > +++ b/object-file.c > > @@ -1492,6 +1492,12 @@ static int loose_object_info(struct repository *r, > > > > if (!oi->contentp) > > break; > > + if (oi->content_limit && *oi->sizep > oi->content_limit) { > > I cannot convince myself enough to say "content limit" is a great > name. It invites "limited by what? text files are allowed but > images are not?". Hmm... naming is a most difficult problem :< ->slurp_max? It could be ->content_slurp_max, but I think that's too long... Would welcome other suggestions... > > diff --git a/object-store-ll.h b/object-store-ll.h > > index c5f2bb2fc2..b71a15f590 100644 > > --- a/object-store-ll.h > > +++ b/object-store-ll.h > > @@ -289,6 +289,7 @@ struct object_info { > > struct object_id *delta_base_oid; > > struct strbuf *type_name; > > void **contentp; > > + size_t content_limit; > > > > /* Response */ > > enum { > > diff --git a/packfile.c b/packfile.c > > index 4028763947..c12a0515b3 100644 > > --- a/packfile.c > > +++ b/packfile.c > > @@ -1529,7 +1529,7 @@ int packed_object_info(struct repository *r, struct packed_git *p, > > * We always get the representation type, but only convert it to > > * a "real" type later if the caller is interested. > > */ > > - if (oi->contentp) { > > + if (oi->contentp && !oi->content_limit) { > > *oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep, > > &type); > > if (!*oi->contentp) > > @@ -1555,6 +1555,17 @@ int packed_object_info(struct repository *r, struct packed_git *p, > > *oi->sizep = size; > > } > > } > > + > > + if (oi->contentp) { > > + if (oi->sizep && *oi->sizep < oi->content_limit) { > > It happens that with the current code structure, at this point, > oi->content_limit is _always_ non-zero. But it felt somewhat > fragile to rely on it, and I would have appreciated if this was > written with an explicit check for oi->content_limit, just like how > it is done in loose_object_info() function. Right. I actually think something like: assert(oi->content_limit); /* see `if' above */ if (oi->sizep && *oi->sizep < oi->content_limit) { is good for documentation purposes since this is in the `else' branch of the `if (oi->contentp && !oi->content_limit) {' condition.