On 8/23/2018 2:53 PM, Jeff King wrote:
On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:
I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).
My numbers may be more dramatic because my Linux environment is a virtual
machine.
If you have a chance, can you run p0001 on my patch (compared to
2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double
check that it really is fixing the problem you saw.
Sure. Note: I had to create a new Linux VM on a different machine
between Tuesday and today, so the absolute numbers are different.
Using git/git:
Test v2.18.0 v2.19.0-rc0 HEAD
-------------------------------------------------------------------------
0001.2: 3.10(3.02+0.08) 3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3%
Using torvalds/linux:
Test v2.18.0 v2.19.0-rc0 HEAD
------------------------------------------------------------------------------
0001.2: 56.08(45.91+1.50) 56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6%
Now here is where I get on my soapbox (and create a TODO for myself
later). I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively
suggests that the results should be _more_ accurate than the default of
3. However, I then remember that we only report the *minimum* time from
all the runs, which is likely to select an outlier from the
distribution. To test this, I ran a few tests manually and found the
variation between runs to be larger than 3%.
When I choose my own metrics for performance tests, I like to run at
least 10 runs, remove the largest AND smallest runs from the samples,
and then take the average. I did this manually for 'git rev-list --all
--objects' on git/git and got the following results:
v2.18.0 v2.19.0-rc0 HEAD
--------------------------------
3.126 s 3.308 s 3.170 s
For full disclosure, here is a full table including all samples:
| | v2.18.0 | v2.19.0-rc0 | HEAD |
|------|---------|-------------|---------|
| | 4.58 | 3.302 | 3.239 |
| | 3.13 | 3.337 | 3.133 |
| | 3.213 | 3.291 | 3.159 |
| | 3.219 | 3.318 | 3.131 |
| | 3.077 | 3.302 | 3.163 |
| | 3.074 | 3.328 | 3.119 |
| | 3.022 | 3.277 | 3.125 |
| | 3.083 | 3.259 | 3.203 |
| | 3.057 | 3.311 | 3.223 |
| | 3.155 | 3.413 | 3.225 |
| Max | 4.58 | 3.413 | 3.239 |
| Min | 3.022 | 3.259 | 3.119 |
| Avg* | 3.126 | 3.30825 | 3.17025 |
(Note that the largest one was the first run, on v2.18.0, which is due
to a cold disk.)
I just kicked off a script that will run this test on the Linux repo
while I drive home. I'll be able to report a similar table of data easily.
My TODO is to consider aggregating the data this way (or with a median)
instead of reporting the minimum.
Thanks,
-Stolee