Hi David, David Barr wrote: > This python script walks the commit sequence imported by svn-fe. > For each commit, it tries to identify the branch that was changed. > Commits are rewritten to be rooted according to the standard layout. I like the idea and especially that the heuristics are simple. Maybe this could be made git-agnostic using the new ls-tree command you are introducing in fast-import? Though it would need to get a revision list from somewhere. Alternatively, do you think it would make sense for something like this to be implemented as a filter or observer of the fast-import stream as it is generated during an import? > A basic heuristic of matching trees is used to find parents for the > first commit in a branch and for tags. More precisely, the rule used is: > + # Find a common path prefix in the changes for the revision > + subroot = "" > + changes = Popen(["git","diff","--name-only",parent,git_commit], stdout=PIPE) > + for path in changes.stdout: > + match = subroot_re.match(path) > + if match: > + subroot = match.group() > + changes.terminate() > + break The first change lying in one of trunk branch/* tags/* determines the branch. When a branch is renamed, this has a 50/50 chance of choosing the right branch. > + # Choose a parent for the rewritten commit > + if ref in ref_commit: > + parent = ref_commit[ref] > + elif subtree in tree_commit: > + parent = tree_commit[subtree] > + else: > + parent = "" If this is a live branch, the parent is the last commit from that branch. Otherwise, we take the last commit whose resulting tree looked like this one. Or... > + # Default to trunk if the branch is new > + if parent == "" and "refs/heads/trunk" in ref_commit: > + parent = ref_commit["refs/heads/trunk"] ... if all else fails, we take the tip commit on the trunk. For comparison, here's the git-svn rule: > # look for a parent from another branch: > my @b_path_components = split m#/#, $self->{path}; Among the paths above this commit's base directory [if this is branches/foo, examine first branches/foo, then branches, then /]: > while (@b_path_components) { > $i = $paths->{'/'.join('/', @b_path_components)}; > last if $i && defined $i->{copyfrom_path}; > unshift(@a_path_components, pop(@b_path_components)); > } > return undef unless defined $i && defined $i->{copyfrom_path}; Find the first one with copyfrom information (i.e., that was renamed or copied from another rev in this revision). > my $branch_from = $i->{copyfrom_path}; > if (@a_path_components) { > print STDERR "branch_from: $branch_from => "; > $branch_from .= '/'.join('/', @a_path_components); > print STDERR $branch_from, "\n"; > } Build back up the URL (so if branches was renamed to Branches but branches/foo had no copyfrom information, we look for Branches/foo). [...] > my $gs = $self->other_gs($new_url, $url, > $branch_from, $r, $self->{ref_id}); > my ($r0, $parent) = $gs->find_rev_before($r, 1); Find the last revision that changed that path and record it. Maybe we could benefit from including the copyfrom information in the fast-import stream output by svn-fe somehow? The simplest way to do this would be some specially formatted comments. An alternative (in the spirit of Sam's earlier suggestions) might be to represent it in the tree svn-fe creates, for example by introducing dummy foo.copiedfrom symlinks. Thanks, that was interesting. Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html