On 6/7/2018 10:45 AM, Ævar Arnfjörð Bjarmason wrote:
On Thu, Jun 07 2018, Derrick Stolee wrote:
To test the performance in this situation, I created a
script that organizes the Linux repository in a similar
fashion. I split the commit history into 50 parts by
creating branches on every 10,000 commits of the first-
parent history. Then, `git rev-list --objects A ^B`
provides the list of objects reachable from A but not B,
so I could send that to `git pack-objects` to create
these "time-based" packfiles. With these 50 packfiles
(deleting the old one from my fresh clone, and deleting
all tags as they were no longer on-disk) I could then
test 'git rev-list --objects HEAD^{tree}' and see:
Before: 0.17s
After: 0.13s
% Diff: -23.5%
By adding logic to count hits and misses to bsearch_pack,
I was able to see that the command above calls that
method 266,930 times with a hit rate of 33%. The MIDX
has the same number of calls with a 100% hit rate.
Do you have the script you used for this? It would be very interesting
as something we could stick in t/perf/ to test this use-case in the
future.
How does this & the numbers below compare to just a naïve
--max-pack-size=<similar size> on linux.git?
Is it possible for you to tar this test repo up and share it as a
one-off? I've been polishing the core.validateAbbrev series I have, and
it would be interesting to compare some of the (abbrev) numbers.
Here is what I used. You will want to adjust your constants for whatever
repo you are using. This is for the Linux kernel which has a
first-parent history of ~50,000 commits. It also leaves a bunch of extra
files around, so it is nowhere near incorporating into the code.
#!/bin/bash
for i in `seq 1 50`
do
ORDER=$((51 - $i))
NUM_BACK=$((1000 * ($i - 1)))
echo creating batch/$ORDER
git branch -f batch/$ORDER HEAD~$NUM_BACK
echo batch/$ORDER
git rev-parse batch/$ORDER
done
lastbranch=""
for i in `seq 1 50`
do
branch=batch/$i
if [$lastbranch -eq ""]
then
echo "$branch"
git rev-list --objects $branch | sed 's/ .*//'
>objects-$i.txt
else
echo "$lastbranch"
echo "$branch"
git rev-list --objects $branch ^$lastbranch | sed 's/
.*//' >objects-$i.txt
fi
git pack-objects --no-reuse-delta
.git/objects/pack/branch-split2 <objects-$i.txt
lastbranch=$branch
done
for tag in `git tag --list`
do
git tag -d $tag
done
rm -rf .git/objects/pack/pack-*
git midx write