Christian Couder <christian.couder@xxxxxxxxx> 于2021年8月24日周二 下午3:11写道: > > > set_commit_buffer() > > or parse_tree_buffer() cache the 'buf'. 'eaten' means we don't need to free it > > manually. > > Yeah, but it doesn't mean that we cannot use the buf. So perhaps we > could still use it, even if it's eaten. > Yes. Just like the following question: How do we know that the eaten 'buf' has not been free() by some logic in git when we want to use it? > > In grab_sub_body_contents() we can indeed write the address of > > 'buf' to v->s, but what I worry about is that before we completely write v->s to > > the output buffer through append_atom(), will this buf be freed in > > somewhere else? > > Yeah, that's a good question. But actually it seems that the buf is > freed (at least in get_object()) only when it's not eaten. > get_object() will free the 'buf' which has not been eaten, but we can take use of it (v->s = buf and we free it later) > > > > + else > > > > + v->s = buf; > > > > v->s_size = buf_size; > > > > } else if (atom->u.raw_data.option == RAW_LENGTH) { > > > > v->s = xstrfmt_len(&v->s_size, > > > > "%"PRIuMAX, (uintmax_t)buf_size); > > > > > > > > As parse_object_buffer() does internally: the buffer of commit/tree objects > > > > needs to be copied, but blob/tag not. You said that the number of commits > > > > is generally significantly greater than the others. It seems that we cannot > > > > make full use of this idea. But remember the "skip_parse_buffer" before? > > > > If we skip the parse_object_buffer(), this buffer is also "!eaten". > > > > > > > > In other words, those default paths of git cat-file --batch are included in it! > > > > So from the perspective of performance regression, this patch will be > > > > beneficial. > > > > > > Yeah, it seems that we can benefit from something like this, but it'd > > > > Only when the data allocated to us is dynamically allocated and we do have > > the ownership of it, we can benefit by reducing copies and allocate > > memory again. > > > > > be nice if things were clarified a bit more. > > > > OK. In get_object(), eaten means that the oi->content we obtained is cached > > by git in parse_object_buffer(). This means that we don't need to free this buf > > manually. In 28511adfa5 ([GSOC] ref-filter: skip parse_object_buffer()), we skip > > parse_buffer() when using some atoms, which can avoid some meaningless parsing. > > So under this path, our'buf' is not cached. This means we have > > ownership of it, so we > > can free it freely, and benefit from this. > > In the patch we are not freeing it, we only duplicate it when it's > eaten. So I am not sure that the above explanation are clear enough. > That's the issue I have with this patch, but maybe the commit message > can now be discussed and improved after sending the patch series to > the mailing list instead of in this thread. > My thoughts are: 1. those 'buf' which have been eaten, because we don't know when they will be freed, if we use it in append_atom(), it may be a little unsafe if it have been freed, but it is safe to use after copying it; 2. those uneaten 'buf', it can be used safely. Maybe we can guarantee that the buf have not been freed when append_atom()? I don't have a good grasp. > Thanks! Thanks. -- ZheNing Hu