On Tue, 11 Dec 2007, Daniel Berlin wrote: > On 12/11/07, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Tue, 11 Dec 2007, Daniel Berlin wrote: > > > > > > This seems to be a common problem with git. It seems to use a lot of > > > memory to perform common operations on the gcc repository (even though > > > it is faster in some cases than hg). > > > > The thing is, git has a very different notion of "common operations" than > > you do. > > > > To git, "git annotate" is just about the *last* thing you ever want to do. > > It's not a common operation, it's a "last resort" operation. In git, the > > whole workflow is designed for "git log -p <pathnamepattern>" rather than > > annotate/blame. > > > I understand this, and completely agree with you. > However, I cannot force GCC people to adopt completely new workflow in > this regard. > The changelog's are not useful enough (and we've had huge fights over > this) to do git log -p and figure out the info we want. > Looking through thousands of diffs to find the one that happened to > your line is also pretty annoying. > Annotate is a major use for gcc developers as a result > I wish I could fix this silliness, but i can't :) > > > That said, I'll see if I can speed up "git blame" on the gcc repository. > > It _is_ a fundamentally much more expensive operation than it is for > > systems that do single-file things. > > SVN had the same problem (the file retrieval was the most expensive op > on FSFS). One of the things i did to speed it up tremendously was to > do the annotate from newest to oldest (IE in reverse), and stop > annotating when we had come up with annotate info for all the lines. > If you can't speed up file retrieval itself, you can make it need less > files :) > In GCC history, it is likely you will be able to cut off at least 30% > of the time if you do this, because files often have changed entirely > multiple times. Unfortunately, we're doing that already. One improvement that is already available is that we can do progressive annotate: we can output lines we find in the order we find them, such that lines that changed recently (which are usually the more interesting ones) get annotated quicker. Obviously, you need a GUI-ish thing to do this, because pagers don't like having stuff written out of order, but there's a good chance that a user annotating fold-const.c will have the info for the interesting lines in a few seconds, and go on while git is still trying to find where the boring old lines came from. There's also the possibility of generating caches of commit:file pairs you've annotated, which would make generating the annotation for something you'd annotated for a recent commit blindingly fast. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html