[GSOC][QUESTION] How to parse the properties of the object at once

ZheNing Hu <adlternative@xxxxxxxxx> · Sat, 7 Aug 2021 14:32:51 +0800

Hi guys,

parse_object_buffer() which will call parse_tag_buffer() and
parse_commit_buffer()
to parse object data and store in `struct tag` and `struct commit`, which can
directly obtain the parsed data later by something like grab_tag_values() and
grab_commit_values().

But parse_object_buffer() will only parse part of the object data, so
that we need
some additional parsing like grab_person() and grab_sub_body_contents() in
ref-filter. It is a repetitive parsing and will affect performance.

So I am thinking if we can add some members in `struct commit` or `struct tag`,
so that we can get more different types of data in the parsing process.

At the same time, these parsing are optional, which means that we can set
several hook pointers to decide whether we need this type data, like
oid_object_info_extended() does, in this way we will not bring a lot of
performance loss when we don't need them.

But I find in commit.h, there is such a comment:

/*
 * The size of this struct matters in full repo walk operations like
 * 'git clone' or 'git gc'. Consider using commit-slab to attach data
 * to a commit instead of adding new fields here.
 */

This means that I shouldn't touch the content of struct commit. So I see the
code of `commit-slab`, it seems that it is doing additional parsing.
But what I hope
is that let parse_commit_buffer() can parse commit data only once.

In addition, I am thinking about whether to build a huge "struct object_view"
to store the parsed objects' properties states and results.

Any good ideas?
--
ZheNing Hu