Re: More precise tag following

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Sat, 27 Jan 2007 17:46:14 +0100 (CET)

Hi,

On Sat, 27 Jan 2007, Simon 'corecode' Schubert wrote:

> Johannes Schindelin wrote:
> > It is slower than Subversion's counterpart, just because SVN's blame sucks.
> > You cannot find out the _relevant_ information easily, i.e. once you merged
> > something, the _merge_ gets attributed for the change (at least the last
> > time I tried it).
> > 
> > So, don't blame blame for being useful in git.
> 
> Your reasoning is backwards.  Git's blame (or fwiw, rev-list path.name) 
> is not slower because it is doing a better job (I can't tell, I don't 
> use svn), but because it uses an algorithm which doesn't scale.  
> rev-list and blame are O(number of commits between HEAD and root) and 
> not O(number of commits affecting path).

Ah, I think you fall in the "files matter" trap.

My point is: for what git does it does not need information which might or 
might not be present, but it derives that information which was there from 
the beginning: the ancestry path.

Many people don't use or even need blame. And what you want to introduce 
would affect them, too.

That is why I proposed a cache (of precomputed data): you don't have to 
change _anything_ in the file format, but you can speed the processes up 
-- locally! -- if they matter to you.

Which means it works on old repositories, too.

> It might be sufficient for git.git, but certainly not for projects with 
> a long history.  we are talking KDE, FreeBSD, OOo, something like this.  
> They each got about 400k commits.  It takes literally *minutes* to get a 
> rev-list or a blame for a certain path.  The algorithm simply does not 
> scale.  And this has nothing to do with superior output, because hg does 
> it in O(num_of_file_revs), so it *can* be done.

But can hg do it that fast, if you track code _movement_ between files? I 
doubt so.

I don't know if git can, at the moment, but even if it cannot, in future 
versions this may well be possible, exactly because we do _not_ rely on 
metadata to be stored in the objects, which can be derived from the 
history as-is anyway.

> > Of course, you could introduce a cache, but then, I don't run blame 
> > _that_ often.
> 
> I don't think a cache is the right way.  I'd call the right idea 
> "auxillary information".

You can name it "Dirty Harry" if you want.

The important part is that you should not change the file format when you 
do not have to.

Rather, calculate the information you need from the existing data, and if 
you can reuse it, store it locally. _That_ is flexibility.

It also gives me a warm fuzzy feeling that no bogus "auxillary 
information" can be introduced by fetching from somewhere else. (It does 
not matter if intended or unintended.)

And if something is wrong with that "auxillary information", it can be 
regenerated correctly, without touching the real data -- the commit 
ancestry.

Just think of .git/info/refs: this data is derived from the repository, 
but because you need it so often (or it would be prohibitively expensive 
to do otherwise), it is derived only when needed, then stored, and 
retrieved quite often.

> > Besides, we already introduced an orthogonal historisation by reflogs, 
> > and your method would not cope gracefully with that, would it?
> 
> I don't see how reflogs can play into this.  After all we're talking 
> about the series of commits the blob experienced to get into its current 
> state, not the series of actions it took this repo to contain this blob.

My point was that you want to introduce a reverse mapping onto the history 
DAG. But this claims that there is only one history you can possibly look 
at. This assumption is wrong.

It can make a lot of sense to git-blame a change on a pull, maybe because 
you don't want to fix it yourself, but throw it all back to the lieutnant 
whom you pulled that part from.

You could find that pull (in theory; I don't think it works right now) 
with git-blame walking the _reflogs_ instead of the _commit history_.

In this case, your reverse mapping would be wrong.

See?

Ciao,
Dscho

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html