On Thu, Oct 05, 2017 at 06:48:10PM +0900, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > This is weirdly specific. Can we accomplish the same thing with existing > > tools? > > > > E.g., could: > > > > git cat-file --batch-all-objects --batch-check='%(objectname)' | > > shuffle | > > head -n 100 > > > > do the same thing? > > > > I know that "shuffle" isn't available everywhere, but I'd much rather > > see us fill in portability gaps in a general way, rather than > > introducing one-shot C code that needs to be maintained (and you > > wouldn't _think_ that t/helper programs need much maintenance, but try > > perusing "git log t/helper" output; they have to adapt to the same > > tree-wide changes as the rest of the code). > > I was thinking about this a bit more, and came to the conclusion > that "sort -R" and "shuf" are wrong tools to use. We would want to > measure with something close to real world workload. for example, > letting > > git rev-list --all --objects > > produce the listof objects in traversal order (i.e. this is very > similar to the order in which "git log -p" needs to access the > objects) and chomping at the number of sample objects you need in > your test would give you such a list. Actually, I'd just as soon see timings for "git log --format=%h" or "git log --raw", as opposed to patches 1 and 2. You won't see a 90% speedup there, but you will see the actual improvement that real-world users are going to experience, which is way more important, IMHO. -Peff