Hi Junio, On 8 Mar 2022, at 11:59, Junio C Hamano wrote: > "John Cai via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > >> diff --git a/builtin/cat-file.c b/builtin/cat-file.c >> index 7b3f42950ec..ab9a49e13a4 100644 >> --- a/builtin/cat-file.c >> +++ b/builtin/cat-file.c >> @@ -351,6 +351,14 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d >> } >> } >> >> +static int print_default_format(char *buf, int len, struct expand_data *data) >> +{ >> + return xsnprintf(buf, len, "%s %s %"PRIuMAX"\n", oid_to_hex(&data->oid), >> + data->info.type_name->buf, >> + (uintmax_t)*data->info.sizep); >> + >> +} > > OK. We want size and type if we were to show the default output out > of the object-info API. > >> /* >> * If "pack" is non-NULL, then "offset" is the byte offset within the pack from >> * which the object may be accessed (though note that we may also rely on >> @@ -363,6 +371,11 @@ static void batch_object_write(const char *obj_name, >> struct packed_git *pack, >> off_t offset) >> { >> + struct strbuf type_name = STRBUF_INIT; >> + >> + if (!opt->format) >> + data->info.type_name = &type_name; > > And at this point, !opt->format means we would use the default > format, so we cannot leave .type_name member NULL. That is OK > but puzzling. Why didn't we need this before? > > If the caller is batch_objects(), there is the "mark_query" call to > strbuf_expand() to learn which field in data->info are needed, so > it seems that this new code should NOT be necessary. > > Side note. I briefly wondered if this expand is something you > would want to optimize when the default format is used, but this > is just "probe just once to ensure various members of data->info > are populated, to prepare for showing hundreds of objects in the > batch request", so it probably is not worth it. > > I am guessing that this is for callers that do not come via > batch_objects() where the "mark_query" strbuf_expand() is not made? > If so, > > * why is it sufficient to fill .type_name and not .sizep for the > default format (i.e. when opt->format is NULL)? > > * why is it OK not to do anything for non-default format? If no > "mark_query" call has been made, we wouldn't be preparing the > .type_name field even if the user-supplied format calls for > %(objecttype), would we? > > Looking at the call graph: > > - batch_object_write() is called by > - batch_one_object() > - batch_object_cb() > - batch_unordered_object() > > - batch_one_object() is called only by batch_objects() > - batch_object_cb() is used only by batch_objects() > > - batch_unordered_object() is called by > - batch_unordered_loose() > - batch_unordered_packed() > and these two are called only by batch_objects() > > And the "mark_query" strbuf_expand() to probe which members in > expand_data are are necessary is done very early, before any of the > calls batch_objects() makes that reach batch_object_write(). > > OK, so my initial guess that the new "we need .type_name member to > point at a strbuf" is because there are some code that bypasses the > "mark_query" strbuf_expand() in batch_objects() is totally wrong. > Everybody uses the "mark_query" thing. Then why do we need to ask > type_name? > > Going back to the new special case print_default_format() gives us > the answer to the question. It expects that data->info already > knows the stringified typename in the type_name member. The > original slow code path in expand_atom() uses this, instead: > > } else if (is_atom("objecttype", atom, len)) { > if (data->mark_query) > data->info.typep = &data->type; > else > strbuf_addstr(sb, type_name(data->type)); Thanks for going through this analysis! so looks like I am relying on oid_object_info_extended() which calls do_oid_object_info_extended(), which calls type_name(co->type) if oi->type_name is not NULL. This is a bit roundabout, so I like what you suggest below of just calling type_name() in print_default_format() directly. > > Which makes me wonder: > > * Is calling type_name(data->type) for many objects a lot less > efficient than asking the stringified type_name from the > object-info layer? I'm not sure, but I imagine that if the # of calls to type_name remain the same, eg: once per object that it wouldn't really matter much where in the stack it happens. Also, I took a look at type_name() in object.c and it's just a lookup in a constant array so that should be pretty fast. > If that is the case, would you gain > performance for all cases if you did this instead > > } else if (is_atom("objecttype", atom, len)) { > - if (data->mark_query) > - data->info.typep = &data->type; > - else > - strbuf_addstr(sb, type_name(data->type)); > + if (data->mark_query) { > + data->info.typep = &data->type; > + data->info.type_name = &data->type_name; > + } else { > + strbuf_addstr(sb, data->type_name); > + } > > in expand_atom()? I don't quite follow here. Would we add a member type_name to expand_data? Also where would the call to type_name() be to get the stringified type_name? Also I'm thinking this approach may not work well with the default format optimization as we would be skipping the strbuf_expand() call altogether when default format is used. > > Side note: I am keeping data->info.typep because a lot of > existing code switches on data->type, which is an enum. > > We may have to keep the strbuf_release() at the end of this > function this patch added, to release data->info.type_name, if we > go that route, but we wouldn't be dealing with an on-stack > type_name in this function. > > * If it does not make any difference between calling type_name() on > our side in expand_atom() or asking object-info API to do so, > then would it make more sense to lose the local type_name strbuf > and print type_name(data->type) in print_default_format() instead? I think this is the most intuitive solution. > > Other than that, this looks good to me. > > Thanks. thanks! John