On 4/17/2017 10:53 AM, Jeff Hostetler wrote:
On 4/15/2017 1:55 PM, René Scharfe wrote:
Am 14.04.2017 um 21:12 schrieb git@xxxxxxxxxxxxxxxxx:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
Very nice, especially the perf test! But can we unbundle the different
optimizations into separate patches with their own performance numbers?
Candidates IMHO: The change in add_index_entry_with_check(), the first
hunk in has_dir_name(), the loop shortcuts. That might also help find
the reason for the slight slowdown of 0006.3 for the kernel repository.
Let me take a look at this and see if it helps.
Last night I pushed up version 11 which has the 3 parts
of read-cache.c in 3 commits (but still in the same patch
series). This should allow for more experimentation.
The add_index_entry_with_check() shows a gain. For the
operations in p0006 on linux.git, the short-cut was being
taken 57993 of 57994 times.
The top of has_dir_name() -- by itself -- does not, but
the short-cut only triggers when the paths have no
prefix in common -- which only happens when the top-level
directory changes. On linux.git, this was 19 of 57993.
However, it does set us up for the next part.
The 3 loop body short-cuts hit 54372, 3509, and 86 (sum
57967) times. So in p0006, the search was only attempted
7 times (57993 - 19 - 57967) most of the time.
WRT the slowdown of 0006.3 on linux.git, I suspect this is
I/O noise. In the commit message for part 2 in V11, I
show 2 runs on linux.git that show wide variance in the 0006.3
times. And given the nature of that test, the speed up in the
lookups is completely hidden by the I/O of the full checkouts.
When I step up to a repo with 4M files, the results are very
clear.
https://public-inbox.org/git/20170417213734.55373-6-git@xxxxxxxxxxxxxxxxx/