Re: Git's database structure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin wrote:
But you can add _yet another_ index to it, which can be generated on the fly, so that Git only has to generate the information once, and then reuse it later. As a benefit of this method, the underlying well-tested structure needs no change at all.

And in fact, you can do this today, without modifying git-blame at all, by (ab)using its "-S" option (which lets you specify a custom ancestry chain to search). By coincidence, I was just showing some people at my office how to do this yesterday. I'll cut-and-paste from the email I sent them. I am not claiming this is nearly as desirable as a built-in, auto-updated secondary index, but it proves the concept, anyway.

Fast-to-generate version:

git-rev-list HEAD -- main.c | awk '{if (last) print last " " $0; last=$0;}' > /tmp/revlist

This speeds things up a lot, because git blame doesn't have to examine other revisions:

time git blame main.c
  1.56s user 0.30s system 99% cpu 1.868 total
time git blame -S /tmp/revlist main.c
  0.21s user 0.03s system 96% cpu 0.249 total

The bad news is that generating that revision list is a bit slow, and if you do it the naive way I suggested above, you can't use the rev list with the -M option (to follow renames). The good news is that it's possible to have that too if you generate a list of revisions that includes the renames:

# Generate a list of all revisions in the right order (only need to do this once, not once per file)
git rev-list HEAD > /tmp/all-revs
# Generate a list of the revisions that touched this file, following copies/renames.
# Could do this in fewer commands but this is hopefully easier to follow.
git blame --porcelain -M main.c | \
  egrep '^[0-9a-f]{40}' | \
  cut -d' ' -f1 | \
  fgrep -f - /tmp/all-revs | \
  awk '{if (last) print last " " $0; last=$0;}' > /tmp/revlist

Then -M is fast too:

time git blame -M main.c
  1.72s user 0.27s system 89% cpu 2.219 total
time git blame -M -S /tmp/revlist main.c
  0.29s user 0.03s system 93% cpu 0.341 total

Oddly, if you use the -S option, "git blame -C" actually gets significantly *slower*. I am not sure why.

-Steve
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux