On Sun, Aug 1, 2021 at 8:45 AM ZheNing Hu <adlternative@xxxxxxxxx> wrote: > > My eleventh week blog finished: > The web version is here: > https://adlternative.github.io/GSOC-Git-Blog-11/ Great, thanks! > ### Attempt to optimize performance > > This week, at the prompt of my mentor Christian, I used `gprof` for some > performance tests about `git cat-file --batch`: > [Re: [GSOC] How to improve the performance of git cat-file --batch] > (https://lore.kernel.org/git/CAOLTT8TdL7UhfVSOzbpmo-WFNrcKwmy=E720tNt4KM9o_p=keg@xxxxxxxxxxxxxx/) [...] > Here is an amazing fact: > > The number of calls to `lookup_object()` before and after using my > patch are 0 and > 522709 respectively. Therefore, I am very surprised, why do we have > these additional calls? > After printing the call stack of `lookup_object()`, we can know that > the `parse_buffer()` > is calling them. s/the `parse_buffer()`/`parse_buffer()`/ or s/the `parse_buffer()`/the `parse_buffer()` function/ Also: s/them/it/ > A very straightforward idea, can we avoid calling > this function? > > In `parse_object_buffer()`, `parse_blob_buffer()`, ``parse_tree_buffer()`, > `parse_commit_buffer()`, and `parse_tag_buffer()` parse the object s/parse/we parse/ > data, and then store > it in `struct object *obj`, finally return it to the caller. Maybe: s/finally/and finally/ > `get_object()` will feed the `obj` to `grab_values()`, and then > `grab_values()` will feed the > `obj` to `grab_tag_values()`, `grab_commit_values()`, which can fill > the object info in `obj` to > implement some atom, e.g. `%(tag)`, `%(type)`, `%(object)`, `%(tree)`, > `%(numparent)`,`%(parent)`. > It is worth noting that `%(objectname)`, `%(objecttype)`, > `%(objectsize)`, `%(deltabase)`, `%(rest)`, > `%(raw)` are did not appear in them, this means that we can avoid s/are did not/don't/ > parsing object buffer when we > don't use those atoms which require `obj`'s information! > > After some processing and adaptation, I made the patch which can skip s/the patch/a patch/ > `parse_object_buffer()` > in some cases, this is the result of the performance test of > `t/perf/p1006-cat-file.sh`: > > ``` > Test HEAD~ HEAD > ------------------------------------------------------------------------------------ > 1006.2: cat-file --batch-check 0.10(0.09+0.00) > 0.11(0.10+0.00) +10.0% > 1006.3: cat-file --batch-check with atoms 0.09(0.08+0.01) > 0.09(0.06+0.03) +0.0% > 1006.4: cat-file --batch 0.62(0.58+0.04) > 0.57(0.54+0.03) -8.1% > 1006.5: cat-file --batch with atoms 0.63(0.60+0.02) > 0.52(0.49+0.02) -17.5% > ``` > > We can see that the performance of `git cat-file --batch` has been a > certain improvement! Yeah, sure -8.1% or -17.5% is really nice! But why +10.0% for `cat-file --batch-check`? > Tell a joke: removing 1984531500 if checking can reduce the startup > time of GTA5 by 70%. :-D s/if checking/checks/ As this joke refers to: https://rockstarintel.com/a-fan-reduces-gta-online-loading-times-by-70 it might be nice to add a link to help people like me who didn't know about this and had to google it. > Currently the patch has not been submitted to the mailing list, let us > wait a bit... Looking forward to it...