Re: Converting to Git using svn-fe (Was: Speeding up the initial git-svn fetch)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 21 Oct 2010, Will Palmer wrote:
> On Wed, 2010-10-20 at 22:44 +0200, Jakub Narebski wrote:

> > Because (from what I understand) revisions in Subversion are whole
> > project all-branches snapshots, and because revision identifiers are
> > monotonically incrementing numbers, there is no inherent notion of
> > _parent_ of commit, like there is in Git.  (I think that was the reason
> > why merge tracking was absent from Subversion until version 1.5, and
> > why mergeinfo is per-file rather than per-commit/per-revision property).
> 
> To clarify, I was saying that there is a "parent" of each SVN commit, in
> the top-level sense. This can be easily converted into a "whole
> repository" ("svnroot") tree in git. Of course, this isn't useful for
> actual work, but it's a good middle-layer, from which other more-useful
> things can be derived.

"Whole repository hierarchy (snvroot) snapshots" are useless without
extra work; Git needs "whole project" snapshots for its commits.

But the whole long description of "branching" model in Subversion was
meant as intro for explanation why there can be mishandled commits
in Subversion, which make it impossible to have 1-to-1 SVN revision to
Git commit mapping.
 
> In terms of converting the svnroot git history into actual branches,
> there are several options for mapping things. Ignoring merges for a
> moment, we could (for example) notice when two trees (as in tree
> objects) are very similar at some point in history, and decide that
> those are probably branches.

Actually as Stephen Bash wrote in his response creating branches in
Subversion generates 'copy' operations in svndump... we have to filter
out 'copy' operations which do not create new branches, though.

> It's tedious, but still fairly simple, to 
> walk the history and build a new history consisting only of edits to a
> subtree (even if the commit messages don't always make sense out of
> context). It really doesn't matter one lick whether a single svn commit
> touched multiple generated git commits.

We would have to ensure that commits in Git in branch 'foo' are the same
as history of 'project/branches/foo' subtree in svnroot in Subversion.
Otherwise we would either have different history in Git and in Subversion,
or we would have screwed up DAG of revisions in Git.

> Of course, "ignoring merges" is temporary and a total cop-out, but I
> wouldn't for a moment pretend that converting svn branches into git
> branches is difficult.

I don't think the most common "sane" Subversion merge case would be
difficult to translate into merge commit in Git: the svn:mergeinfo 
property would have common revisions for all affected files/directories.

The problem is that like it is possible to mishandle commit like described
by Stephen Bash by creating all-branches revision, it is also possible
to mishandle merge in Subversion, creating revision where different files
are merged from different branches: such thing does not have easy
translation to Git commit-level rather than file-level merge tracking.
 
[...]

> > So to have the same results for 'svn log' when on branchs 'foo' and
> > 'bar' (however you switch branches in subversion), or 
> > 'svn log <foo URL>' and 'svn log <bar URL>' like for 'git log foo'
> > and 'git log bar' in the [mishandling] situation described above
> > you have to map single all-branches revision 4 in Subversion into
> > two commits 4' and 4'' in Git.
> > 
> > 
> > Please correct me if I am wrong about Subversion model.
> 
> Also correct. One SVN commit would logically map to several git commits.
> It's best to think in terms of:
> ([svn commit] + [svn path]) -> [git commit] (or git tag, if we can get
> the heuristics right)

If I remember correctly some of discussion was whether there can truly
be irrecovable situation where single SVN revision *must* be mapped into
more than one Git commit (one-to-many mapping).

> > > The difference of course is that the "name" of an svn revision stays the
> > > same even if aspects of that revision (for example, the commit message)
> > > are changed, while the "name" of a git commit is dependent on everything
> > > that makes up a commit. In git terms, changing a commit message is
> > > considered to be history rewriting, whereas in svn terms it is merely
> > > something which happens occasionally as part of regularly maintained
> > > repository.
> > > 
> > > the git Philosophy is ingrained in its object model: If you change
> > > something which led to a state, you change the state itself. I don't
> > > think there should be an attempt to work-around that philosophy when
> > > talking to external repositories. That is to say: if a commit message
> > > (or other revprop) in history changes, we want to treat it as if we were
> > > recovering from an upstream rebase. Of course, a problem in that could
> > > very well be "how would we know about it?", which is a good question,
> > > but one not directly related to [revision+directory]<->[commit]
> > > mappings, afaik ;)
> > 
> > Better solution, actually proposed in separate subthread, is to make use
> > of new 'git replace' / 'refs/replaces/*' feature in Git, creating 
> > replacement for revision which changed some property retroactively...
> 
> I'm not entirely familiar with the git replace mechanism, but wouldn't
> that mean that repository git-A (cloned from SVN before the property
> change) and repository git-B (cloned from SVN after the property change)
> would be unable to merge with each-other?
> In my mind, if it would be a rebase when it happens in git-land, it
> should be a rebase when it happens in
> mechanism-to-make-external-repository-act-just-like-git land.

Note that there is problem with possibly changing svn:log, svn:author and
svn:date revision properties is only when there is ongoing interaction
between Subversion repository (or mirror) and Git repository (or mirror).
There is no problem with this issue when doing one-shot conversion.
 
The major problem is that svn:log etc. are _unversioned_ properties (see
http://svnbook.red-bean.com/en/1.5/svn.ref.properties.html), so I am not
sure if there is a way for Subversion server to tell that some svn:log
properties changed.  Perhaps there is a log, even if properties are
unversioned... otherwise we would have to detect somehow that properties
changed.


But let's assume that we have a way of notifying or noticing that e.g.
svn:log property changed.

Say that svn:log property for revision 'n was A at the time Git fetched
from SVN repository, and SVN revision 'n' is mapped to commit AA with
commit message A.

Later we fetch again from SVN repository, and besides new revisions to
be converted we notice somehow that svn:log property for revision 'n'
changed from A to B.

We now create replacement commit BB in Git, with the same Git parent
as commit AA, and with commit message changed to BB.  Then we add
commit BB as replacement for AA:

  $ git replace -f AA BB

(or its low level equivalent, or its batch equivalent when it exists).
This replacement is saved as a ref in 'refs/replaces/*' namespace.  All
git commands (except some plumbing perhaps, and unless you pass 
'--no-replace-objects' option to git wrapper) would then work as if
commit AA was replaced by commit BB; in particular 'git show AA' and
'git log' would show BB version.

Because replacements are stored as refs in 'refs/replaces/*' namespace,
it is simple to transfer them.  Each repository that fetches those refs
(+refs/replaces/*:refs/replaces/*) would see replaced contents.  Those
that do not fetch it would see old contents (and perhaps would have
problems like iteracting with SVN repository).


Alternate solution, though not as natively nice, would be to have empty
or placeholder commit, and store true commit message in notes for commit
AA, i.e. the message A would be in git note for AA.  Changing commit
message would mean changing note: after change commit AA would have a
commit-message note with contents B.


If changes to unversioned revision properties are rare, then replacement
technique is much superior to using notes, which generates unnatural git
repository.  When changing commit messages (svn:log) and the like are 
common and often, which would result in great many replacements, the
notes technique could be better because of performance reasons.

> > ...if Subversion actually offer any way to ask for changed properties.
> > Thankfully from what I understand from comments in this thread this
> > feature of being able to change revision properties like commit message
> > or authorship is by default turned off in Subversion.
> 
> Any sufficiently large SVN-tracked project will use all of SVN's
> features, whether the maintainer remembers or not ;)

Heh.

> Certainly it could be a "few and far between" thing, which doesn't need
> to be handled to get going / usable (especially since creating a fresh
> clone is so much faster than with git-svn). I don't know the internals
> of SVN beyond what was mentioned in the manual 5 or so years ago, but I
> assume you'd need to pretty much iterate over the entire history in
> either a slow, git-svn like manner, or a wasteful, "download everything
> to check a few things" manner, just in order to check that your
> properties are up-to-date. Perhaps I'm thinking of these things wrongly,
> and there's actually a simple log-based mechanism for checking such
> things which would be fast enough to work into regular git-gc-ish
> maintenance.

Again: svn:log, svn:author and svn:date are Unversioned Properties, but
perhaps Subvrsion repository stores log of changes somewhere (similarly
to git reflog, though hopefully not expired too early).

P.S. The later in this thread, the more I see how utterly wrong
     Subversion model of version control is (branches, tags, merges).

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]