Re: git performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 22, 2008 at 05:55:14PM -0400, Edward Ned Harvey wrote:

> I'm talking about 40-50,000 files, on multi-user production linux,
> which means the cache is never warm, except when I'm benchmarking.

Well, if you have a cold cache it's going to take longer. :) You should
probably benchmark if you want to know exactly how long.

> Specifically RHEL 4 with the files on NFS mount.  Cold cache "svn st"
> takes ~10 mins.  Warm cache 20-30 sec.  Surprisingly to me,

Wow, that is awful. For comparison, "git status" from a cold on the
kernel repo takes me 17 seconds. From a warm cache, less than half a
second.

Yes, the cold cache case would probably be better with inotify, but
compared to svn, that's screaming fast. I haven't used perforce. If your
bottleneck really is stat'ing the tree, then yes, something that avoided
that might perform better (but weigh that particular optimization
against other things which might be slower).

> Out of curiosity, what are they talking about, when they say "git is
> fast?"

Well, there are the numbers above. When comparing to SVN or (god forbid)
CVS, there are order of magnitude speedups for most common operations.

>  Just the fact that it's all local disk, or is there more to it
> than that?  I could see - git would probably outperform perforce for

The things that generally make git fast are:

  - using a compact on-disk structure (including zlib and aggressive
    delta-finding) to keep your cache warm (and when it's not warm, to
    get data off the disk as quickly as possible)

  - the content-addressable nature of objects means we can just look at
    the data we need to solve a problem. For example,
    getting the history between point A and point B is "O(the number of
    commits between A and B)", _not_ "O(the size of the repo)".
    Viewing a log without generating diffs is "O(the number of
    commits)", not "O(some combination of the number of commits and the
    number of files in each commit)". Diffing two points in history is
    "O(the size of the differences between the two points)" and is
    totally independent of the number of commits between the two points.

  - most operations are streamable. "git log >/dev/null" on the kernel
    repo (about 90,000 commits) takes 8.5 seconds on my box. But it
    starts generating output immediately, so it _feels_ instant, and the
    rest of the data is generated while I read the first commit in my
    pager.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux