Hi, Peff, Jeff King <peff@xxxxxxxx> 于2021年4月2日周五 下午11:39写道: > > On Fri, Apr 02, 2021 at 05:03:17PM +0800, ZheNing Hu wrote: > > > * Because part of the feature of `git for-each-ref` is very similar to > > that of `git cat-file`, I think `git cat-file` can learn some feasible > > solutions from it. > > > > #### My possible solutions: > > > > 1. Same [solution](https://github.com/git/git/pull/568/commits/cc40c464e813fc7a6bd93a01661646114d694d76) > > as Olga, add member `struct ref_format format` in `struct > > batch_options`. > > 2. Use the function > > [`verify_ref_format()`](https://github.com/gitgitgadget/git/blob/84d06cdc06389ae7c462434cb7b1db0980f63860/ref-filter.c#L904) > > to replace the first `expand_format()` for parsing format strings. > > 3. Write a function like > > [`format_ref_array_item()`](https://github.com/gitgitgadget/git/blob/84d06cdc06389ae7c462434cb7b1db0980f63860/ref-filter.c#L2392), > > get information about objects, and use `get_object()` to grub the > > information which we prefer (or just use `grab_common_value()`). > > 4. The migration of `%(rest)` may require learning the handling of > > `%(if)` ,`%(else)`. > > I think one thing to keep an eye on here is the performance of cat-file. > The formatting code used by for-each-ref is rather slow (it may load > more of the object details than is necessary, it is too eager to > allocate intermediate strings, and so on). That's usually not _too_ big > a problem for ref-filter, because the number of refs tends to be much > smaller than the number of total objects. But I'd expect that moving to > the ref-filter code would make something like: > > git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' > > measurably slower. > Forgive me for thinking about the whole question too simple. It seems that there are a lot of points to think about in this project. > IMHO the right path forward is not to try porting cat-file to use the > ref-filter code, but to start first with writing a universal formatting > module that takes the best of both implementations (and the commit > pretty-printer): > > - separate the format-parsing stage from formatting actual items, as > ref-filter does. This lets us have more complex formats without > paying a per-item runtime cost while formatting. This should also > allow us to handle multiple syntaxes for the same thing (e.g., > ref-filter %(authorname) vs pretty.c %an). > This is a good suggestion. Olga seems to have wanted to remove `mark_query` in `struct expand_data`, I think she also wanted to separate the two parts. The ref-filter uses `used_atom` as the result of parsing `%(atom)`, It’s really worth learning. > - figure out which data will be needed for each item based on the > parsed format, and then do the minimum amount of work to get that > data (using "oid_object_info_extended()" helps here, because it > likewise tries to do as little work as possible to satisfy the > request, but there are many elements that it doesn't know about) > I have indeed noticed that `oid_object_info_extended()` can get information about the object which we actually want. In `cat-file.c`, It has been used in `batch_object_write()`, and `expanding_atom()` specify what data we need. In `ref-filter.c`, It has been used in `get_object()`. I am not sure what you mean about "many elements that it doesn't know about", For the time being, `cat-file` can get 5 kind of objects info it need. Maybe you think that `cat-file` can learn some features in `ref-filter` to extend the function of `cat-file --batch`? E.g. %(objectname:short)? I think I may have a better understanding of the topic of this mini-project now. We may not want to port the logic of cat-file,but to learn some design in `ref-filter`, right? > - likewise avoid doing any intermediate work we can; as much as > possible, format the result directly into a result strbuf, rather > than allocating many sub-strings and assembling them (as cat-file > does). > I guess you mean `scratch` in `batch_object_write()` every time new content is added after `strbuf_reset`, but refilter just append messages to `final_buf`. > - handle formats where the necessary item data may or may not be > present. E.g., if we're given a refname, then "%(refname)" makes > sense. But in cat-file we'd not have a refname, and just an object. > We should still be able to use the same formatting code to handle > "%(objecttype)", etc. Likewise for formats which require a specific > type (say %(authorname) for a commit, but the object is a blob). > Ref-filter does this to some degree for things like authorname, but > we'd be extending it to the case that we don't even have a refname. > I may not have a very deep understanding of some details. On this issue, I think we can use `info_source` to invalidate interfaces that are not of the same type(only allow SOUCR_OTHER) Let me summarize: First part : Parsing any type of atoms, whether it is %an or %(authorname). Second part : Find all functions that get objects information (which `oid_object_info_extended()` can't get) Third part : Optimize multiple small strings in cat-file into one `finnal_buf`. Forth part : Error handling for unsuitable atoms. Que: These task sound like the logic of `ref-filter`, so if I can participate in this project later, should I start with optimizing the logic of `ref-filter`, right? > -Peff Thank you so much! -- ZheNing Hu