Re: git annotate runs out of memory

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 11 Dec 2007 10:40:36 -0800 (PST)

On Tue, 11 Dec 2007, Daniel Berlin wrote:
>
> This seems to be a common problem with git. It seems to use a lot of
> memory to perform common operations on the gcc repository (even though
> it is faster in some cases than hg).

The thing is, git has a very different notion of "common operations" than 
you do.

To git, "git annotate" is just about the *last* thing you ever want to do. 
It's not a common operation, it's a "last resort" operation. In git, the 
whole workflow is designed for "git log -p <pathnamepattern>" rather than 
annotate/blame.

In fact, we didn't support annotate at all for the first year or so of 
git.

The reason for git being relatively slow is exactly that git doesn't have 
"file history" at all, and only tracks full snapshots. So "git blame" is 
really a very complex operation that basically looks at the global history 
(because nothing else exists) and will basically generate a totally 
different "view" of local history from that one.

The disadvantage is that it's much slower and much more costly than just 
having a local history view to begin with.

However, the absolutely *huge* advantage is that it isn't then limited to 
local history.

So where git shines is when you actually use the global history, and do 
merges or when you track more than one file (which others find hard, but 
git finds much more natural).

An examples of this is content that actually comes from multiple files. 
File-based systems simply cannot do this at all. They aren't just slower, 
they are totally unable to do it sanely. For git, it's all the same: it 
never really cares about file boundaries in the first place.

The other example is doing things like "git log -p drivers/char", where 
you don't ask for the log of a single file, but a general file pattern, 
and get (still atomic!) commits as the result.

And perhaps the best example is just tracking code when you have two files 
that merge into one (possibly because the "same" file was created 
independently in two different branches). git gets things like that right 
without even thinking about it. Others tend to just flounder about and 
can't do anything at all about it.

That said, I'll see if I can speed up "git blame" on the gcc repository. 
It _is_ a fundamentally much more expensive operation than it is for 
systems that do single-file things.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html