On 10/5/2017 6:00 AM, Jeff King wrote:
On Thu, Oct 05, 2017 at 06:48:10PM +0900, Junio C Hamano wrote:
Jeff King <peff@xxxxxxxx> writes:
This is weirdly specific. Can we accomplish the same thing with existing
tools?
E.g., could:
git cat-file --batch-all-objects --batch-check='%(objectname)' |
shuffle |
head -n 100
do the same thing?
I know that "shuffle" isn't available everywhere, but I'd much rather
see us fill in portability gaps in a general way, rather than
introducing one-shot C code that needs to be maintained (and you
wouldn't _think_ that t/helper programs need much maintenance, but try
perusing "git log t/helper" output; they have to adapt to the same
tree-wide changes as the rest of the code).
I was thinking about this a bit more, and came to the conclusion
that "sort -R" and "shuf" are wrong tools to use. We would want to
measure with something close to real world workload. for example,
letting
git rev-list --all --objects
produce the listof objects in traversal order (i.e. this is very
similar to the order in which "git log -p" needs to access the
objects) and chomping at the number of sample objects you need in
your test would give you such a list.
Actually, I'd just as soon see timings for "git log --format=%h" or "git
log --raw", as opposed to patches 1 and 2.
You won't see a 90% speedup there, but you will see the actual
improvement that real-world users are going to experience, which is way
more important, IMHO.
-Peff
Thanks for thinking hard about this.
For some real-user context: Some engineers using Git for the Windows
repo were seeing extremely slow commands, such as 'fetch' or 'commit',
and when we took a trace we saw most of the time spinning in this
abbreviation code. Our workaround so far has been to set core.abbrev=40.
I'll run some perf numbers for these commands you recommend, and also
see if I can replicate some of the pain points that triggered this
change using the Linux repo.
Thanks,
-Stolee