Re: impure renames / history tracking

Andreas Ericsson <ae@xxxxxx> · Thu, 02 Mar 2006 23:06:00 +0100

Paul Jakma wrote:
On Wed, 1 Mar 2006, Andreas Ericsson wrote:

It's completely impossible to fold *ALL* the history into a single 
commit, and since you want heuristics I would imagine you wouldn't 
want that either.

I want to know whether additional meta-data to help the existing 
heuristics would be acceptable. From a discussion on #git yesterday I 
gather the best way forward would to be to first prototype something 
keeping state in a file in .git.

All that's needed really is something that relates the following 3 things:

    commit-id obj1-id obj2-id

Ie: For <commit-id>, <obj1-id> is similar to <obj2-id>.

Maintaining this state could be done via the git-mv/rename wrappers and 
an additional git-edit wrapper. Those who are quite happy with the 
existing diff-input only similarity heuristics wouldn't have to bother 
using a git-edit wrapper obviously, those who want to let git gather 
additional 'similarity hint' in this way could.

Aside:

Git might be easier to extend generally if it adopted just /one/ new 
core header, say "see-also" - that could serve as a pointer to arbitrary 
commit-related meta-info objects that aren't of immediate interest to 
either:

a) core git

or

b) the user

Things that aren't of interest to either core git or the user is already 
handled properly. It's called "cruft". ;)

However, I see what you're trying for here. Something like the X-* 
headers inside a mailer. Not all MUA's understand them, but if they do 
they can make use of them to the users benefit.

Format:

    see-also <word> <obj-id>

E.g.:

    see-also similars <obj-id>

Where <obj-id> would list the 'commit obj1 obj2', but just as:

    obj1 obj2

Would ultimately be neater than fishing around in .git/, and would allow 
other extensions in the future too.

The <word> identifier preferably would need to be centrally co-ordinated.

With X-* headers I don't see why it should have to be. Only the X-* part 
is mentioned in the RFC, so with a proper format Junio won't have to 
coordinate cross-SCM tools, git-tortoise, etc, etc...

I'm confused. First you say you want to have one single mega-patch for 
each commit, then you say you want to be able to follow history back. 
It's like deciding to throw away your wallet and then trying to get 
someone to pick it up and carry it around for you.

I'm not sure why think mega-patch. Collapsing a bunch of commits related 
to one project need not result in a big patch relative to the repository 
as a whole.

Mainly I think it's because you mentioned several renames of a single 
file and many files renamed + rewritten (beyond gits current ability of 
recognizing it). That's definitely a mega-patch in my book.

Where the project concerned is like BSD, not 
just a kernel but a complete userland (so 1.1GB of source code).

<just curious>
Such a large project surely must be split in several smaller 
sub-projects? GNU is, after all, several small (and not so small) 
components. X works the same way. Linux is a large project, but each 
compartment of code can be managed on its own, so long as they adhere to 
the ABI hooking them back in to the kernel core.
</just curious>

--
Andreas Ericsson                   andreas.ericsson@xxxxxx
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html