Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?

Christian Couder <christian.couder@xxxxxxxxx> · Tue, 24 Aug 2021 09:11:14 +0200

On Sat, Aug 21, 2021 at 4:16 AM ZheNing Hu <adlternative@xxxxxxxxx> wrote:
>
> Christian Couder <christian.couder@xxxxxxxxx> 于2021年8月20日周五 下午11:58写道：
> >
> > On Wed, Aug 18, 2021 at 1:11 PM ZheNing Hu <adlternative@xxxxxxxxx> wrote:

> > > -static void grab_sub_body_contents(struct atom_value *val, int deref,
> > > struct expand_data *data)
> > > +static void grab_sub_body_contents(struct atom_value *val, int deref,
> > > struct expand_data *data, int eaten)
> > >  {
> > >         int i;
> > >         const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
> > > @@ -1499,7 +1499,10 @@ static void grab_sub_body_contents(struct
> > > atom_value *val, int deref, struct exp
> > >                         unsigned long buf_size = data->size;
> > >
> > >                         if (atom->u.raw_data.option == RAW_BARE) {
> > > -                               v->s = buf;
> > > +                               if (eaten)
> > > +                                       v->s = xmemdupz(buf, buf_size);
> >
> > Can the 'buf' already be eaten at this point? I thought that it could
> > be eaten only through v->s so after this code.
>
> I think where the 'buf' is eaten is parse_object_buffer().

Yeah, right, and parse_object_buffer() seems to be called by
get_object() which calls grab_values() after that. So the buf can
indeed be eaten at that point.

> set_commit_buffer()
> or parse_tree_buffer() cache the 'buf'. 'eaten' means we don't need to free it
> manually.

Yeah, but it doesn't mean that we cannot use the buf. So perhaps we
could still use it, even if it's eaten.

> In grab_sub_body_contents() we can indeed write the address of
> 'buf' to v->s, but what I worry about is that before we completely write v->s to
> the output buffer through append_atom(), will this buf be freed in
> somewhere else?

Yeah, that's a good question. But actually it seems that the buf is
freed (at least in get_object()) only when it's not eaten.

> > > +                               else
> > > +                                       v->s = buf;
> > >                                 v->s_size = buf_size;
> > >                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
> > >                                 v->s = xstrfmt_len(&v->s_size,
> > > "%"PRIuMAX, (uintmax_t)buf_size);
> > >
> > > As parse_object_buffer() does internally: the buffer of commit/tree objects
> > > needs to be copied, but blob/tag not. You said that the number of commits
> > > is generally significantly greater than the others. It seems that we cannot
> > > make full use of this idea. But remember the "skip_parse_buffer" before?
> > > If we skip the parse_object_buffer(), this buffer is also "!eaten".
> > >
> > > In other words, those default paths of git cat-file --batch are included in it!
> > > So from the perspective of performance regression, this patch will be
> > > beneficial.
> >
> > Yeah, it seems that we can benefit from something like this, but it'd
>
> Only when the data allocated to us is dynamically allocated and we do have
> the ownership of it, we can benefit by reducing copies and allocate
> memory again.
>
> > be nice if things were clarified a bit more.
>
> OK. In get_object(), eaten means that the oi->content we obtained is cached
> by git in parse_object_buffer(). This means that we don't need to free this buf
> manually. In 28511adfa5 ([GSOC] ref-filter: skip parse_object_buffer()), we skip
> parse_buffer() when using some atoms, which can avoid some meaningless parsing.
> So under this path, our'buf' is not cached. This means we have
> ownership of it, so we
> can free it freely, and benefit from this.

In the patch we are not freeing it, we only duplicate it when it's
eaten. So I am not sure that the above explanation are clear enough.
That's the issue I have with this patch, but maybe the commit message
can now be discussed and improved after sending the patch series to
the mailing list instead of in this thread.

Thanks!