Re: Some git performance measurements..

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> Umm. See my earlier numbers. For "git checkout" with cold cache, the 
> *bulk* of the time is actually the ".gitignore" file lookups, so if you 
> see a three-second improvement out of 17s, it may not look spectacular, 
> but considering that probably 10s of those 17s were something *else* going 
> on, I suspect that if you really did just a plain "git checkout", you 
> actually *do* have a spectacular improvement of roughly 7s -> 4s!

I am hoping that "probably 10s of those 17s" can actually be measured
with the patch I sent out last night.  Has anybody took a look at it?

Partitioning the pack data by object type shifts the tradeoffs from the
current "the data in the same tree are mostly together, except commits
are treated differently because rev walk is done quite often" layout.
Because we do not ever look at blob objects while pruning the history
(unless the -Spickaxe option is used, I think), partitioned layout would
optimize ancestry walking even more than the current packfile layout.

On the other hand, any operation that wants to look at the contents are
penalized.  A two-tree diff that inspects the contents (e.g. fuzzy
renames and pickaxe) needs to read from the tree section to find which
blob to compare with which other blob, and and then needs to seek to the
blob section to actually read the contents, while the current layout
tends to group both trees and blobs that belong to the same tree
together.  It is natural that blame is penalized by the new layout,
mostly because it needs to grab two blobs to compare from parent-child
pair, but also because it needs to find two-tree diffs for parent-child
pair it traverses whenever it needs to follow across renames (that is,
when it sees there is no corresponding path in the parent).  I would
expect to see similar slowdown from grep which wants to inspect blobs
that are in the same tree.

When I do archaeology, I think I often run blame first to see which
change made the block of text into the current shape first, and then run
a path limited "git log -p" either starting or ending at that revision.
In that workflow, the initial blame may get slower with the new layout,
but I suspect it would help by speeding up the latter "git log -p" step.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux