On Thu, Jul 01 2021, ZheNing Hu via GitGitGadget wrote: > From: ZheNing Hu <adlternative@xxxxxxxxx> > > In order to let cat-file use ref-filter logic, let's do the > following: > > 1. Change the type of member `format` in struct `batch_options` > to `ref_format`, we will pass it to ref-filter later. > 2. Let `batch_objects()` add atoms to format, and use > `verify_ref_format()` to check atoms. > 3. Use `format_ref_array_item()` in `batch_object_write()` to > get the formatted data corresponding to the object. If the > return value of `format_ref_array_item()` is equals to zero, > use `batch_write()` to print object data; else if the return > value is less than zero, use `die()` to print the error message > and exit; else if return value is greater than zero, only print > the error message, but don't exit. > 4. Use free_ref_array_item_value() to free ref_array_item's > value. > > Most of the atoms in `for-each-ref --format` are now supported, > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`, > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected: > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`, > `%(flag)`, `%(HEAD)`, because these atoms are unique to those objects > that pointed to by a ref, "for-each-ref"'s family can naturally use > these atoms, but not all objects are pointed to be a ref, so "cat-file" > will not be able to use them. > > The performance for `git cat-file --batch-all-objects > --batch-check` on the Git repository itself with performance > testing tool `hyperfine` changes from 669.4 ms ± 31.1 ms to > 1.134 s ± 0.063 s. > > The performance for `git cat-file --batch-all-objects --batch >>/dev/null` on the Git repository itself with performance testing > tool `time` change from "27.37s user 0.29s system 98% cpu 28.089 > total" to "33.69s user 1.54s system 87% cpu 40.258 total". This new feature is really nice, but that's a really bad performance regression. A lot of software in the wild relies on "cat-file --batch" to be *the* performant interface to git for mass-extrction of object data. That's in increase of ~70% and ~20%, respectively. Have you dug into (e.g. with a profiler) where we're now spending all this time?