Re: impure renames / history tracking

Paul Jakma <paul@xxxxxxxx> · Wed, 1 Mar 2006 18:50:21 +0000 (GMT)

Hi Linus,

On Wed, 1 Mar 2006, Linus Torvalds wrote:

The thing is, it does better than anything that _tries_ to be 
"reliable".

I can pretty much _guarantee_ that you can't do it better.

I'm willing to take that argument to the 'project' concerned, I just 
need to be pretty sure of it.

Tracking "inodes" - aka file identities - (which is what BK does, 
and I assume what SVN does) is fundamentally problematic. I 
particular, it's a horrible problem when two inodes "meet" under 
the same name. You now have two identities for the same file, and 
you're fundamentally screwed.

Yes, in that model it is. This interestingly, is not the BK model, I 
suspect (see below).

It doesn't even need renames to be a problem. JUST THE FACT THAT 
YOU TRY TO TRACK FILE "IDENTITY" HISTORY IS BROKEN.

If it's "file identity" globally across the lifetime of the project, 
I agree 100% per cent. The 'traditional' SCM concerned does this.

That's not what a solution I'd want to explore either, I'm only 
interested in the identity of files for any one /one/ commit. In 
saying that, I recognise it's pointless to try annotate file-change 
information in multi-parent commits (merges).

For example, take CVS, which doesn't actually try to do renames, 
but _does_ try to track the identity of a file, since all the 
history is tied into that identity: think about what happens in 
Attic when a file is deleted. Completely broken model.

ACK, {Attic,deleted_files}/ is just horrid.

And that's really fundamental. CVS doesn't show the problems so 
much, because CVS actively tries to make it hard to do these 
things.

ACK.

With renames-tracking-file-identities, it's _really_ easy to get 
some major confusion going. What happens when one branch creates a 
file, and another one renames a file to that same name, and they 
merge?

Well, the conflict has to be resolved somehow, even today.

Don't tell me it doesn't happen. It happened under BK. The way BK 
"solved" it was to keep the two separate identities: one of them 
got resolved to the new filename, the other one went into the 
"deleted" directory.

Right. That's what the 'traditional workflow' SCM I'm thinking of 
does - not BK funnily enough, but an SCM predating BK which also 
happens to use SCCS files, and with some of the same high-level 
push/pull constructs as BK (interestingly).

It also tracks name history globally using a deleted_files/ history, 
which is maintained, but I don't think it does this for name merges 
like the above.

In the one I'm thinking of, it does (I /think/, I'm not an expert in 
it) the following:

Given two files, say:

'old:

1.1---1.2---1.3

new:

1.1

- constructs a 'fake' base SCCS revision, empty
- adds the top 'old' version as a branch
- adds the top new version as a new delta

   1.1.1.1
  /
1.1---------1.2

Where in the merged file:

	1.1: empty
	1.1.1.1: was 1.3 from 'old'
	1.2: is 1.1 from 'new'

However, it does /not/ create a deleted_files entry for the 'old' 
file. (AFAICT - I may not have a sufficiently full understanding of 
this SCM)

Guess what happens when the side that got merged into "deleted" 
continues to edit the file? That's right - their edits happen on 
the deleted file, and never show up in the real tree in a 
subsequent merge ever again.

Indeed - horrid.

And as far as I can tell, BK really did the best you can do. 
Following file identities really _is_ fundamentally broken. It 
sounds like a nice idea, but while you migth solve a few problems, 
you create a whole raft of much more fundamental problems.

For tracking identity across more than one commit - I fully agree.

That's not what quite I'm thinking of though. Is it worth going on 
with the discussion on a:

	 'track identities *only* from context of /the/ parent to
          this commit'

So next time you think about a merge that migt have been improved 
by tracking renames, please also think about a merge where one of 
the filenames came from two or more different sources through an 
earlier merge, and thank your benevolent Gods that they instructed 
me to make git be based purely on file contents.

Oh, I agree muchely here.

I wouldn't change git. I only wonder if it give its rename-heuristics 
an additional advisory-only hint? (for single-parent commits at least 
- never merges - and only on a per-commit basis).

I probably should first explore how git deals with rename clashes..

regards,
--
Paul Jakma	paul@xxxxxxxx	paul@xxxxxxxxx	Key ID: 64A2FF6A
Fortune:
I'm glad I was not born before tea.
		-- Sidney Smith (1771-1845)
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html