Christian Couder <christian.couder@xxxxxxxxx> 于2021年8月10日周二 下午4:04写道: > > parse_object_buffer(), let's take a look at the result of gprof again: > > > > We need to call grab_sub_body_contents(), grab_person() to rescan the > > buffer and extract the data. > > What if we can combine these multiple scanning and parsing into one completion? > > At least intuitively, this has an opportunity to improve performance. > > Yeah, but is there a way to check that we indeed scan or parse the > same objects multiple times? This way we might get an idea about how > much scanning and parsing we could save. > I think find_subpos() called by grab_sub_body_contents() and find_wholine() called by grab_person() are evidences that we are repeating iteratively. But the proportion of time they occupy is too small. 0.0142% and 0.0109% Sorry, but my attempts over the past two days have not gone well, the changes here will make the program very complicated, the optimization here is not worth doing. > > So I check the implementation > > details of `parse_commit_buffer()` and `parse_tag_buffer()`, maybe we > > can pass some "hook pointer" > > to these parsing functions like `oid_object_info_extended()` does to > > extract only the information we need? > > Would this also avoid scanning and parsing the same object many times? > oid_object_info_extended()? I think it can set the pointer and extract the required value. Well, the problem it solves may be a little different from here. > > The commit-slab caught my attention. It can be used to get some > > specified data content from the object. > > I thought it was for storing commit data in an efficient way. > Yeah. > > I am thinking about whether it is possible to design a `struct > > object_view` (temporarily called > > `struct commit_view`) to store the offset of the parsed data in the > > object content. `parse_commit_buffer()` > > will check whether we need something for in-depth parsing. Like this: > > > > ```c > > struct commit_view { > > int need_tree : 1; > > int need_parents : 1; > > > > int need_author : 1; > > int need_author_name : 1; > > int need_author_email : 1; > > int need_author_date : 1; > > > > int need_committer : 1; > > int need_committer_name : 1; > > int need_committer_email : 1; > > int need_committer_date : 1; > > Is the above info specific for each commit? Or will the above be the > same for all the commits we are processing? > According to the my previous thoughts, I think it is same for all commits. > > Ok, so the idea is to use the commit slab feature to store the struct > commit_view instances. That seems reasonable to me. > It's a pity that it may not bring much optimization in real situations. > > It seems that GSOC has only the last few weeks left, I'm not sure how > > far this patch series is from > > being merged by the master branch. Performance optimization may have > > no end. > > Yeah, but the idea for now is just to make using the ref-filter code > as fast as the current code. > It seems difficult to achieve consistent performance. As we discussed before, the previous `git cat-file --batch` can only provide a few kinds of metadata that does not need to be parsed, and after using ref-filter logic allows cat-file to use more structured information about git objects. But this means it needs a harder path, It requires many traversals and many copies, If we really use the logic in ref-filter, we can only do partial optimization, we can't expect it to be as fast as the old function. Unless we have the opportunity to not use the logic in ref-filter, but use the new atoms in ref-filter, this may has a chance to escape the copy in the ref-filter. I don't know what your opinion are... Thanks. -- ZheNing Hu