Re: More precise tag following

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sat, 27 Jan 2007 11:15:00 -0800 (PST)

On Sat, 27 Jan 2007, Simon 'corecode' Schubert wrote:
> 
> git rev-list and git log (with or without -p) perform poorly when invoked with
> a pathspec.

Really? I would say exactly the opposite. They _smoke_ when invoked with a 
pathspec.

Show me *one* other SCM that even comes close..

And please, realize that git does arbitrary combinations of directories, 
and not just single files. AND THAT IS IMPORTANT!

Any SCM that can't do

	git log drivers/scsi/ include/scsi/

and have it be a sane log of the changes to the _union_ of those two 
directories is strictly inferior to what git can do.

Usually this is something that others CANNOT DO AT ALL.

Even your 1:18 number is a hell of a lot faster than "can't do it", which 
is what you have for everything else I can imagine.

Maybe you just do single files, but my pathspecs tend to be directories or 
multiple files more often than single ones.

How the heck did you intend to cache that?

> I agreee with those numbers.  However, on a converted KDE repo, they are
> *completely* different:
> 
> git log kdelibs/README takes 1:18.  One minute, eighteen seconds.
> git rev-list and git blame take roughly the same time.

Do you have the converted repo somewhere to be cloned for? It's going to 
be a lot more interesting for scalability testing than anything else.

It is possible, for example, that the real issue is that we shouldn't 
compress delta objects in a pack.

> That's what we were getting at.  Not the superiority of git blame (no irony)
> and thus reduced speed, but the algorithmic deficiency of any operation on a
> pathspec/object, which can be easily fixed.

The thing is, one of the reasons the git object database is small is that 
it compresses really well, and I suspect that for the KDE repo, what 
you're seeing is really a combination of:

 - the KDE people were idiots in the first place to make it into one big 
   repo

 - we've consciously made repo size be a major goal, and yes, we spend a 
   lot of CPU as a result, following delta chains etc. The zlib overhead 
   is more visible, because once you've uncompressed the delta the delta 
   itself is really quick to apply, but the whole "trees compress really 
   well" all boils down to the same thing: we create lots of small 
   objects, and we have tons of deltas, and the hierarchical nature of the 
   data structures (ie saving the trees not as one big manifest but as 
   a more complex hierarchial datastructure) is what allows us to do tons 
   of the path-based optimizations.

But they all do end up boiling down to "we use lots of CPU".

And I suspect tweaking the existing stuff is quite reasonable. But we need 
to have a public repo that people who want to tweak can play with (for 
example, the old "linux-history" archive was what made us tweak things 
like gitk, which was horribly horribly bad).

So please point to a kde conversion archive to play with (maybe you have, 
I missed it).

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html