Re: [GSOC] [QUESTION] ref-filter: can %(raw) implement reuse oi.content?

ZheNing Hu <adlternative@xxxxxxxxx> · Wed, 25 Aug 2021 16:11:45 +0800

Christian Couder <christian.couder@xxxxxxxxx> 于2021年8月24日周二 下午3:11写道：
>
> > set_commit_buffer()
> > or parse_tree_buffer() cache the 'buf'. 'eaten' means we don't need to free it
> > manually.
>
> Yeah, but it doesn't mean that we cannot use the buf. So perhaps we
> could still use it, even if it's eaten.
>

Yes. Just like the following question: How do we know that the eaten 'buf' has
not been free() by some logic in git when we want to use it?

> > In grab_sub_body_contents() we can indeed write the address of
> > 'buf' to v->s, but what I worry about is that before we completely write v->s to
> > the output buffer through append_atom(), will this buf be freed in
> > somewhere else?
>
> Yeah, that's a good question. But actually it seems that the buf is
> freed (at least in get_object()) only when it's not eaten.
>

get_object() will free the 'buf' which has not been eaten, but we can
take use of
it (v->s = buf and we free it later)

> > > > +                               else
> > > > +                                       v->s = buf;
> > > >                                 v->s_size = buf_size;
> > > >                         } else if (atom->u.raw_data.option == RAW_LENGTH) {
> > > >                                 v->s = xstrfmt_len(&v->s_size,
> > > > "%"PRIuMAX, (uintmax_t)buf_size);
> > > >
> > > > As parse_object_buffer() does internally: the buffer of commit/tree objects
> > > > needs to be copied, but blob/tag not. You said that the number of commits
> > > > is generally significantly greater than the others. It seems that we cannot
> > > > make full use of this idea. But remember the "skip_parse_buffer" before?
> > > > If we skip the parse_object_buffer(), this buffer is also "!eaten".
> > > >
> > > > In other words, those default paths of git cat-file --batch are included in it!
> > > > So from the perspective of performance regression, this patch will be
> > > > beneficial.
> > >
> > > Yeah, it seems that we can benefit from something like this, but it'd
> >
> > Only when the data allocated to us is dynamically allocated and we do have
> > the ownership of it, we can benefit by reducing copies and allocate
> > memory again.
> >
> > > be nice if things were clarified a bit more.
> >
> > OK. In get_object(), eaten means that the oi->content we obtained is cached
> > by git in parse_object_buffer(). This means that we don't need to free this buf
> > manually. In 28511adfa5 ([GSOC] ref-filter: skip parse_object_buffer()), we skip
> > parse_buffer() when using some atoms, which can avoid some meaningless parsing.
> > So under this path, our'buf' is not cached. This means we have
> > ownership of it, so we
> > can free it freely, and benefit from this.
>
> In the patch we are not freeing it, we only duplicate it when it's
> eaten. So I am not sure that the above explanation are clear enough.
> That's the issue I have with this patch, but maybe the commit message
> can now be discussed and improved after sending the patch series to
> the mailing list instead of in this thread.
>

My thoughts are:

1. those 'buf' which have been eaten, because we don't know when they
will be freed, if we
use it in append_atom(), it may be a little unsafe if it have been
freed, but it is safe to use after
copying it;
2. those  uneaten 'buf', it can be used safely.

Maybe we can guarantee that the buf have not been freed when append_atom()?
I don't have a good grasp.

> Thanks!

Thanks.
--
ZheNing Hu