On Sat, Aug 21, 2021 at 4:16 AM ZheNing Hu <adlternative@xxxxxxxxx> wrote: > > Christian Couder <christian.couder@xxxxxxxxx> 于2021年8月20日周五 下午11:58写道: > > > > On Wed, Aug 18, 2021 at 1:11 PM ZheNing Hu <adlternative@xxxxxxxxx> wrote: > > > -static void grab_sub_body_contents(struct atom_value *val, int deref, > > > struct expand_data *data) > > > +static void grab_sub_body_contents(struct atom_value *val, int deref, > > > struct expand_data *data, int eaten) > > > { > > > int i; > > > const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL; > > > @@ -1499,7 +1499,10 @@ static void grab_sub_body_contents(struct > > > atom_value *val, int deref, struct exp > > > unsigned long buf_size = data->size; > > > > > > if (atom->u.raw_data.option == RAW_BARE) { > > > - v->s = buf; > > > + if (eaten) > > > + v->s = xmemdupz(buf, buf_size); > > > > Can the 'buf' already be eaten at this point? I thought that it could > > be eaten only through v->s so after this code. > > I think where the 'buf' is eaten is parse_object_buffer(). Yeah, right, and parse_object_buffer() seems to be called by get_object() which calls grab_values() after that. So the buf can indeed be eaten at that point. > set_commit_buffer() > or parse_tree_buffer() cache the 'buf'. 'eaten' means we don't need to free it > manually. Yeah, but it doesn't mean that we cannot use the buf. So perhaps we could still use it, even if it's eaten. > In grab_sub_body_contents() we can indeed write the address of > 'buf' to v->s, but what I worry about is that before we completely write v->s to > the output buffer through append_atom(), will this buf be freed in > somewhere else? Yeah, that's a good question. But actually it seems that the buf is freed (at least in get_object()) only when it's not eaten. > > > + else > > > + v->s = buf; > > > v->s_size = buf_size; > > > } else if (atom->u.raw_data.option == RAW_LENGTH) { > > > v->s = xstrfmt_len(&v->s_size, > > > "%"PRIuMAX, (uintmax_t)buf_size); > > > > > > As parse_object_buffer() does internally: the buffer of commit/tree objects > > > needs to be copied, but blob/tag not. You said that the number of commits > > > is generally significantly greater than the others. It seems that we cannot > > > make full use of this idea. But remember the "skip_parse_buffer" before? > > > If we skip the parse_object_buffer(), this buffer is also "!eaten". > > > > > > In other words, those default paths of git cat-file --batch are included in it! > > > So from the perspective of performance regression, this patch will be > > > beneficial. > > > > Yeah, it seems that we can benefit from something like this, but it'd > > Only when the data allocated to us is dynamically allocated and we do have > the ownership of it, we can benefit by reducing copies and allocate > memory again. > > > be nice if things were clarified a bit more. > > OK. In get_object(), eaten means that the oi->content we obtained is cached > by git in parse_object_buffer(). This means that we don't need to free this buf > manually. In 28511adfa5 ([GSOC] ref-filter: skip parse_object_buffer()), we skip > parse_buffer() when using some atoms, which can avoid some meaningless parsing. > So under this path, our'buf' is not cached. This means we have > ownership of it, so we > can free it freely, and benefit from this. In the patch we are not freeing it, we only duplicate it when it's eaten. So I am not sure that the above explanation are clear enough. That's the issue I have with this patch, but maybe the commit message can now be discussed and improved after sending the patch series to the mailing list instead of in this thread. Thanks!