Duy Nguyen <pclouds@xxxxxxxxx> writes: > On Tue, Feb 11, 2014 at 6:17 PM, David Kastrup <dak@xxxxxxx> wrote: >> >> Looking in the Makefile, I just find support for coverage reports using >> gcov. Whatever is there with "profile" in it seems to be for >> profile-based compilation rather than using gprof. >> >> Now since I've managed to push most of the runtime for basic git-blame >> operation out of blame.c proper, it becomes important to figure out >> where most of the remaining runtime (a sizable part of that being system >> time) is being spent. Loop counts like that provided by gcov (or am I >> missing something here?) are not helpful for that, I think I rather need >> the kind of per-function breakdown that gprof provides. >> >> Is there a reason there are no prewired recipes or advice for using >> gprof on git? Is there a way to get the work done, namely seeing the >> actual distribution of call times (rather than iterations) using gcov so >> that this is not necessary? > > Would perf help? No changes required, and almost no overhead, I think. Not useful. It would be probably nice for nailing down the performance gains when the work is finished so that future regressions will be noticeable. It's reasonable easy to create a test case that will take hours with the current git-blame and would finish in seconds with the improved one. But it's not useful at all for figuring out the hotspots within the git-blame binary. I made do with something like make CFLAGS=-pg LDFLAGS=-pg but it is sort of annoying that the required "make clean" apparently also cleans out the coverage files: for sensible finding of bad stuff one needs them as well. At any rate, I'll probably figure out something eventually. No idea whether I'll get around to writing some useful instructions. At the current point of time, it would appear that a large part of the remaining user time (about half) is spent in xdl_hash_record so x86_64 architectures already benefit from XDL_FAST_HASH (which seems to hurt more than it would help with my i686). So finding a good fast hash function would likely help. The current hash function _and_ the XDL_FAST_HASH replacement used on x86_64 are a drag here because they are not easily split into a word-unaligned, a (typically long and thus most performance-relevant) word-aligned, and another word-unaligned part in a manner that allows calculating the same hash for different alignments. Something like a CRC lends itself much better to that, but since its basic operation is more complex on a general-purpose CPU, it's not likely to result in a net win. In assembly language, add-and-multiply operations modulo 2^32-1 are pretty easy to do and lend themselves well to considering alignment at the end, but in C, access to mixed precision multiplications and the carry flag are rather awkward. -- David Kastrup -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html