Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> 于2021年7月2日周五 下午9:39写道: > > > On Thu, Jul 01 2021, ZheNing Hu via GitGitGadget wrote: > > > From: ZheNing Hu <adlternative@xxxxxxxxx> > > > > In order to let cat-file use ref-filter logic, let's do the > > following: > > > > 1. Change the type of member `format` in struct `batch_options` > > to `ref_format`, we will pass it to ref-filter later. > > 2. Let `batch_objects()` add atoms to format, and use > > `verify_ref_format()` to check atoms. > > 3. Use `format_ref_array_item()` in `batch_object_write()` to > > get the formatted data corresponding to the object. If the > > return value of `format_ref_array_item()` is equals to zero, > > use `batch_write()` to print object data; else if the return > > value is less than zero, use `die()` to print the error message > > and exit; else if return value is greater than zero, only print > > the error message, but don't exit. > > 4. Use free_ref_array_item_value() to free ref_array_item's > > value. > > > > Most of the atoms in `for-each-ref --format` are now supported, > > such as `%(tree)`, `%(parent)`, `%(author)`, `%(tagger)`, `%(if)`, > > `%(then)`, `%(else)`, `%(end)`. But these atoms will be rejected: > > `%(refname)`, `%(symref)`, `%(upstream)`, `%(push)`, `%(worktreepath)`, > > `%(flag)`, `%(HEAD)`, because these atoms are unique to those objects > > that pointed to by a ref, "for-each-ref"'s family can naturally use > > these atoms, but not all objects are pointed to be a ref, so "cat-file" > > will not be able to use them. > > > > The performance for `git cat-file --batch-all-objects > > --batch-check` on the Git repository itself with performance > > testing tool `hyperfine` changes from 669.4 ms ± 31.1 ms to > > 1.134 s ± 0.063 s. > > > > The performance for `git cat-file --batch-all-objects --batch > >>/dev/null` on the Git repository itself with performance testing > > tool `time` change from "27.37s user 0.29s system 98% cpu 28.089 > > total" to "33.69s user 1.54s system 87% cpu 40.258 total". > > This new feature is really nice, but that's a really bad performance > regression. A lot of software in the wild relies on "cat-file --batch" > to be *the* performant interface to git for mass-extrction of object > data. > Thanks, this performance is indeed worrying. > That's in increase of ~70% and ~20%, respectively. Have you dug into > (e.g. with a profiler) where we're now spending all this time? See this two attachment about performance flame graph, oid_object_info_extended() in get_object() is the key to performance limitations. -- ZheNing Hu
Attachment:
cat-file-batch-batch-all-objects.svg
Description: image/svg
Attachment:
cat-file-batch-check-batch-all-objects.svg
Description: image/svg