Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?

Christian Couder <christian.couder@xxxxxxxxx> · Fri, 20 Aug 2021 17:58:07 +0200

On Wed, Aug 18, 2021 at 1:11 PM ZheNing Hu <adlternative@xxxxxxxxx> wrote:
>
> ZheNing Hu <adlternative@xxxxxxxxx> 于2021年8月18日周三 下午5:07写道：
> > I think the 'buf' is not freed in ref-filter logic.
> >
> > But one thing worth noting is:
> >
> > parse_object_buffer() may take this buffer away, and store it in
> > tree->buffer or use set_commit_buffer() to store it in commit_slab.
> >
> > So in theory, as long as we don’t use parse_object_buffer(), this
> > dynamically allocated memory can be used freely for us. If we use
> > parse_object_buffer() and free the buffer, there seems to be a risk,
> > but it does not appear so far.
> >
>
> When parse_object_buffer() hints that we don’t free the buffer when
> eaten == 1, I have a better idea, If the buffer is "eaten", we copy it,
> otherwise we use the originally allocated space.
>
> -static void grab_sub_body_contents(struct atom_value *val, int deref,
> struct expand_data *data)
> +static void grab_sub_body_contents(struct atom_value *val, int deref,
> struct expand_data *data, int eaten)
>  {
>         int i;
>         const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
> @@ -1499,7 +1499,10 @@ static void grab_sub_body_contents(struct
> atom_value *val, int deref, struct exp
>                         unsigned long buf_size = data->size;
>
>                         if (atom->u.raw_data.option == RAW_BARE) {
> -                               v->s = buf;
> +                               if (eaten)
> +                                       v->s = xmemdupz(buf, buf_size);

Can the 'buf' already be eaten at this point? I thought that it could
be eaten only through v->s so after this code.

> +                               else
> +                                       v->s = buf;
>                                 v->s_size = buf_size;
>                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
>                                 v->s = xstrfmt_len(&v->s_size,
> "%"PRIuMAX, (uintmax_t)buf_size);
>
> As parse_object_buffer() does internally: the buffer of commit/tree objects
> needs to be copied, but blob/tag not. You said that the number of commits
> is generally significantly greater than the others. It seems that we cannot
> make full use of this idea. But remember the "skip_parse_buffer" before?
> If we skip the parse_object_buffer(), this buffer is also "!eaten".
>
> In other words, those default paths of git cat-file --batch are included in it!
> So from the perspective of performance regression, this patch will be
> beneficial.

Yeah, it seems that we can benefit from something like this, but it'd
be nice if things were clarified a bit more.