On Mon, Oct 04, 2021 at 04:13:34AM -0400, Jeff King wrote: > It looks like adding the "algo" field did make a big difference for the > oid_array case, but changing it to a char doesn't seem to help at all: > > $ hyperfine -L v none,int,char './git.{v} cat-file --batch-all-objects --batch-check="%(objectname)"' > Benchmark #1: ./git.none cat-file --batch-all-objects --batch-check="%(objectname)" > Time (mean ± σ): 1.653 s ± 0.009 s [User: 1.607 s, System: 0.046 s] > Range (min … max): 1.640 s … 1.670 s 10 runs > > Benchmark #2: ./git.int cat-file --batch-all-objects --batch-check="%(objectname)" > Time (mean ± σ): 1.067 s ± 0.012 s [User: 1.017 s, System: 0.050 s] > Range (min … max): 1.053 s … 1.089 s 10 runs > > Benchmark #3: ./git.char cat-file --batch-all-objects --batch-check="%(objectname)" > Time (mean ± σ): 1.092 s ± 0.013 s [User: 1.046 s, System: 0.046 s] > Range (min … max): 1.080 s … 1.116 s 10 runs > > Summary > './git.int cat-file --batch-all-objects --batch-check="%(objectname)"' ran > 1.02 ± 0.02 times faster than './git.char cat-file --batch-all-objects --batch-check="%(objectname)"' > 1.55 ± 0.02 times faster than './git.none cat-file --batch-all-objects --batch-check="%(objectname)"' > > I'm actually surprised it had this much of an impact. But I guess this > benchmark really is mostly just memcpy-ing oids into a big array, > sorting it, and printing the result. If that array is 12% bigger, we'd > expect at least a 12% speedup. But adding in non-linear elements like > growing the array (though I guess that is amortized linear) and sorting > (which picks up an extra log(n) term) make the difference. > > It's _kind of_ silly in a sense, since usually you'd ask for other parts > of the object, which will make the speed difference relatively smaller. > But just dumping a bunch of oids is actually not an unreasonable thing > to do. I suspect it got a lot slower with 32-byte GIT_MAX_RAWSZ, too > (even when you're using 20-byte sha1), but I don't think there's an easy > way to get out of that. Oh wait, I'm reading it totally wrong. Adding in the extra 4 bytes actually made it _faster_ than not having an algo field. Now I'm super-confused. I could believe that it gave us some better alignment, but the original struct was 32 bytes. 36 seems like a strictly worse number. -Peff