Re: [bug] git svn fetch: defect history - missing merges and wrong tag ancestors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020-12-25 23:24, M. Buecher wrote:
Dear all,

I finally had the time to start converting some older Subversion
repositories to git repositiers and run into issues with repos using
"parallel" branches, so-called vendor branches [1] in Subversion to
track upstream changes via snapshots in a separate branch and merge
them into the custom build on trunk.

Just wanting to convert the Subversion history as-is to git.
I studied the git-svn reference docs [2] plus the related chapter of
the ProGit book [3] and assume that I understood how git-svn works.
Still I'm not a git expert, just a sporadic user.

Somehow `git svn fetch` (2.29.2.windows.3) looses merge information
between vendor branch and trunk plus tags are referencing the
predecessors instead of the original ancestor.
Maybe there is a manual way to fix this for that small repository, but
it wouldn't be feasible for larger repositories, that's why I deceided
to write this bug report.
Is there anything I missed? (see my procedure below)

Fortunately I ran quite early into these issues and also with a very
small repository of just 11 commits, so I can provide a small
reproduction case (see below after the links).
Hoping you can enhance `git svn fetch`.

Tested further with re-created Subversion repositories, that either looked the same but made sure that svn:mergeinfo is present (Subversion >=1.5 via "cherry pick merge", no "2-URL merge"), or that has the vendor branch being copied from trunk (both attached). Only when the vendor branch was a copy from trunk, then svn-git got the merges correct. Otherwise - even with svn:mergeinfo - it does not get the merges.

Assumption:
It seems that git svn handles trunk and branches in a special way, but Subversion actually does not have branches. In Subversion there are just directories and files, and a branch/tag is just a copy of another directory+revision and merges can happen between any directories independent if they have related ancestry or not.

Workaround:
As git svn does not recognize all merges correctly (especially cross-branch copies) those lost links must be added manually. I wrote a small GNU awk script (attached) to determine all cross-branch copies from an `svnadmin dump`. This way the first revision when something got copied is known. Running git svn just to that revision, then fixing the parents and continuing with git svn up the next revision. This helps git svn to find the correct ancestors and correctly build the follow merges. The parents can be changed with `git replace -f --graft <commit> <existing correct parents> <additional correct parents>`. Additionally before continuing with git svn this `git replace` change can be made permanent with `git-filter-repo --force --replace-refs delete-no-add`.


Any help is appreciated, thanks in advance
Matthias Bücher


[Links]
[1]
http://svnbook.red-bean.com/en/1.8/svn.advanced.vendorbr.html#svn.advanced.vendorbr.mirrored-sources
[2] https://git-scm.com/docs/git-svn
[3] https://git-scm.com/book/en/v2/Git-and-Other-Systems-Migrating-to-Git


[System Info]
git version:
git version 2.29.2.windows.3
cpu: x86_64
built from commit: d054eb1fc46ff23e7c95756a7c747e2f2864b478
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
uname: Windows 10.0 19042
compiler info: gnuc: 10.2
libc info: no libc information available
$SHELL (typically, interactive shell): C:\Program Files\Git\usr\bin\bash.exe

[Enabled Hooks]
none


[Commits in Subversion]
A Subversion repository dump is attached, plus a test where I
recreated the same history directly in a git repository without an
issue.

* trunk:  a------------d--e--h--i--j--k
*                     /     /
* vendor: a (empty)--b-----f
*                    ^     ^
* tags:              c     g

* Vendor releases: b, f
* Custom modifications: e, i, j, k
* Tags: really just used as tags, although Subversion internally they
are branches. Therefore wanting to create lightweight git tags,
although annotated git tags would be fine too.


[Expected Subversion to git repo conversion]
* trunk => branch "main"
* branches/* => branch "*"
* tags/* => tag "*"
* vendor/current => branch "vendor/current" (can be renamed later to
just "vendor")
* vendor/* => tag "vendor/*" (except for vendor/current)

Expected "tags":
vendor/5.4 = rev b (maybe c when annotated)
vendor/5.8 = rev f (maybe g when annotated)


[Wrong Results]
Merges from "vendor/current" branch to trunk get lost.
Tags are referencing the predecessors of the expected commit.


[Procedure]
```
### a) preparation
cd /d/Coding
#
cd dd-formmailer
svn log --xml --quiet | grep author | sort -u | perl -pe
's/.*>(.*?)<.*/$1 = /' > authors-transform.txt
## edit authors-transform.txt accordingly

### b) git-svn adapted from ProGit book, but with "svn/" prefix
cd /d/Coding
git svn init --stdlayout --no-metadata --prefix="svn/" --
'svn+ssh://svn@vcs/dd-formmailer' dd-formmailer-git
cd dd-formmailer-git
## edit .git/config for additional vendor branches and tags
<< __EOF
...
[svn-remote "svn"]
	...
	fetch = trunk:refs/remotes/svn/trunk
#	branches = vendor/current:refs/remotes/svn/vendor/current ##
non-glob definition not working for branches
	branches = vendor/{current}:refs/remotes/svn/vendor/*
	branches = branches/*:refs/remotes/svn/*
	tags = vendor/*:refs/remotes/svn/tags/vendor/*
	tags = tags/*:refs/remotes/svn/tags/*
__EOF
##
cat .git/config
git svn fetch --authors-file ../dd-formmailer/authors-transform.txt
#
git branch -vv --list ; git for-each-ref
gitk --all &
```

[Log]
$ cat .git/config
[core]
        repositoryformatversion = 0
        filemode = false
        bare = false
        logallrefupdates = true
        symlinks = false
        ignorecase = true
[svn-remote "svn"]
        noMetadata = 1
        url = svn+ssh://svn@vcs/dd-formmailer
        fetch = trunk:refs/remotes/svn/trunk
        branches = vendor/{current}:refs/remotes/svn/vendor/*
        branches = branches/*:refs/remotes/svn/*
        tags = vendor/*:refs/remotes/svn/tags/vendor/*
        tags = tags/*:refs/remotes/svn/tags/*

$ git svn fetch --authors-file ../dd-formmailer/authors-transform.txt
r1 = 158d53044a5628897379403647a19ea13594b532 (refs/remotes/svn/vendor/current)
        A       _svn__guideline.txt
        A       _svn_client_config.txt
        A       _svn_dir_ignore_list.txt
r1 = a795b654edbc296e6f38da398f032a4851fd0a9e (refs/remotes/svn/trunk)
        A       dd-formmailer.css
        A       dd-formmailer.php
        A       lang/BrazilianPortuguese.php
        A       lang/Catalan.php
        A       lang/Danish.php
        A       lang/Deutsch.php
        A       lang/Dutch.php
        A       lang/English.php
        A       lang/Finnish.php
        A       lang/French.php
        A       lang/Greek.php
        A       lang/Italian.php
        A       lang/NorwegianBokmaal.php
        A       lang/Polish.php
        A       lang/Portuguese.php
        A       lang/Romanian.php
        A       lang/Russian.php
        A       lang/Slovak.php
        A       lang/Slovene.php
        A       lang/Spanish.php
        A       lang/Swedish.php
        A       lang/Turkish.php
        A       recaptchalib.php
r2 = 8fd8668dbed2dde6b55306c71b3b629f5ed794ec (refs/remotes/svn/vendor/current)
Found possible branch point:
svn+ssh://svn@vcs/dd-formmailer/vendor/current =>
svn+ssh://svn@vcs/dd-formmailer/vendor/5.4, 1
Found branch parent: (refs/remotes/svn/tags/vendor/5.4)
158d53044a5628897379403647a19ea13594b532
Following parent with do_switch
        A       dd-formmailer.css
        A       dd-formmailer.php
        A       lang/BrazilianPortuguese.php
        A       lang/Catalan.php
        A       lang/Danish.php
        A       lang/Deutsch.php
        A       lang/Dutch.php
        A       lang/English.php
        A       lang/Finnish.php
        A       lang/French.php
        A       lang/Greek.php
        A       lang/Italian.php
        A       lang/NorwegianBokmaal.php
        A       lang/Polish.php
        A       lang/Portuguese.php
        A       lang/Romanian.php
        A       lang/Russian.php
        A       lang/Slovak.php
        A       lang/Slovene.php
        A       lang/Spanish.php
        A       lang/Swedish.php
        A       lang/Turkish.php
        A       recaptchalib.php
Successfully followed parent
r3 = 7ab4f436cafc8af3ed2e727a6d8cbef1a8f8b39f (refs/remotes/svn/tags/vendor/5.4)
        A       dd-formmailer.css
        A       dd-formmailer.php
        A       lang/BrazilianPortuguese.php
        A       lang/Catalan.php
        A       lang/Danish.php
        A       lang/Deutsch.php
        A       lang/Dutch.php
        A       lang/English.php
        A       lang/Finnish.php
        A       lang/French.php
        A       lang/Greek.php
        A       lang/Italian.php
        A       lang/NorwegianBokmaal.php
        A       lang/Polish.php
        A       lang/Portuguese.php
        A       lang/Romanian.php
        A       lang/Russian.php
        A       lang/Slovak.php
        A       lang/Slovene.php
        A       lang/Spanish.php
        A       lang/Swedish.php
        A       lang/Turkish.php
        A       recaptchalib.php
r4 = 7ec3663fd8e7c9fcfeb2742968a948d6978776d4 (refs/remotes/svn/trunk)
        M       dd-formmailer.css
        M       dd-formmailer.php
        A       dd-verify.php
r5 = e65ba2bdcd12a1935c1a327507dad6b7117f452b (refs/remotes/svn/trunk)
        A       calendar.gif
        A       date_chooser.js
        M       dd-formmailer.css
        M       dd-formmailer.php
        A       lang/Belarussian.php
        A       lang/Czech.php
        A       lang/Estonian.php
        A       lang/Japanese.php
        A       lang/Vietnamese.php
        M       recaptchalib.php
r6 = 9f95df7ce49c5d1cb4715017d223a8bd1c8dcffc (refs/remotes/svn/vendor/current)
Found possible branch point:
svn+ssh://svn@vcs/dd-formmailer/vendor/current =>
svn+ssh://svn@vcs/dd-formmailer/vendor/5.8, 3
Found branch parent: (refs/remotes/svn/tags/vendor/5.8)
8fd8668dbed2dde6b55306c71b3b629f5ed794ec
Following parent with do_switch
        A       calendar.gif
        A       date_chooser.js
        M       dd-formmailer.css
        M       dd-formmailer.php
        A       lang/Belarussian.php
        A       lang/Czech.php
        A       lang/Estonian.php
        A       lang/Japanese.php
        A       lang/Vietnamese.php
        M       recaptchalib.php
Successfully followed parent
r7 = cc00cb187386298cf974dda69f151d2ad4795917 (refs/remotes/svn/tags/vendor/5.8)
        A       calendar.gif
        A       date_chooser.js
        M       dd-formmailer.css
        M       dd-formmailer.php
        M       dd-verify.php
        A       lang/Belarussian.php
        A       lang/Czech.php
        A       lang/Estonian.php
        A       lang/Japanese.php
        A       lang/Vietnamese.php
        M       recaptchalib.php
Checking svn:mergeinfo changes since r5: 1 sources, 1 changed
W: Cannot find common ancestor between
e65ba2bdcd12a1935c1a327507dad6b7117f452b and
9f95df7ce49c5d1cb4715017d223a8bd1c8dcffc. Ignoring merge info.
r8 = 9a5cd8f55f377f469280171cd87219b8f528c693 (refs/remotes/svn/trunk)
        M       dd-formmailer.php
r9 = 8143782376d9fbc91a9181b367b72701205b017f (refs/remotes/svn/trunk)
        M       dd-formmailer.php
r10 = 6f563fc4c511ddcc53c2ffc5d26c1e116725bc6b (refs/remotes/svn/trunk)
        M       _svn_client_config.txt
r11 = e9130854178d2c2743a981f303cdd7f34e54b052 (refs/remotes/svn/trunk)
svnserve: E210002: Network connection closed unexpectedly
Checked out HEAD:
  svn+ssh://svn@vcs/dd-formmailer/trunk r11

$ git branch -vv --list ; git for-each-ref
* main e913085 Updated client config for Windows Scripts
e9130854178d2c2743a981f303cdd7f34e54b052 commit refs/heads/main
7ab4f436cafc8af3ed2e727a6d8cbef1a8f8b39f commit refs/remotes/svn/tags/vendor/5.4 cc00cb187386298cf974dda69f151d2ad4795917 commit refs/remotes/svn/tags/vendor/5.8
e9130854178d2c2743a981f303cdd7f34e54b052 commit refs/remotes/svn/trunk
9f95df7ce49c5d1cb4715017d223a8bd1c8dcffc commit refs/remotes/svn/vendor/current

Attachment: dd-formmailer-mergeinfo.svndump.7z
Description: application/7z-compressed

Attachment: dd-formmailer-branched.svndump.7z
Description: application/7z-compressed

#!/bin/gawk -f
### GNU awk: https://www.gnu.org/software/gawk/manual/

### analyze-svn-dump-for-cross-branch-copies.sh
###
### Parameters:
### * csv=";" (or ",") - export main information as CSV
### * details=1 - see all copied pathes
###
### Copyright (C) 2021  Matthias Bücher, Germany <maddes@xxxxxxxxxx>
###
### This program is free software: you can redistribute it and/or modify
### it under the terms of the GNU General Public License as published by
### the Free Software Foundation, either version 3 of the License, or
### (at your option) any later version.
###
### This program is distributed in the hope that it will be useful,
### but WITHOUT ANY WARRANTY; without even the implied warranty of
### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
### GNU General Public License for more details.
###
### You should have received a copy of the GNU General Public License
### along with this program.  If not, see <https://www.gnu.org/licenses/>.


### ATTENTION! function code has to be adapted to the structure of the repository and its historical changes
function getBranchOfNodePath(nodepath,    branch) {
  branch = ""
  ### --- A) special cases
  ## /vendor/*
  if (match(nodepath, /^vendor\/[^/]*/)) {
    branch = substr(nodepath, RSTART, RLENGTH)
  }
  ## /tags/vendor/*
  else if (match(nodepath, /^tags\/vendor\/[^/]*/)) {
    branch = substr(nodepath, RSTART, RLENGTH)
  }
  ### --- B) standard layout: /branches/*, /tags/*, /trunk
  else if (match(nodepath, /^branches\/[^/]*/)) {
    branch = substr(nodepath, RSTART, RLENGTH)
  }
  else if (match(nodepath, /^tags\/[^/]*/)) {
    branch = substr(nodepath, RSTART, RLENGTH)
  }
  else if (match(nodepath, /^trunk\//)) {
    branch = substr(nodepath, RSTART, RLENGTH-1)
  }
  ### --- C) fallback: remove last component
  else {
    branch = gensub(/\/[^/]+$/, "", "", nodepath)
  }
  #
  return branch
}


BEGIN {
  FS = "\n"
}

BEGINFILE {
  ## initialize variables
  revision = ""
  nodepath = ""
  nodecopyrev = ""
  nodecopypath = ""
  ## initialize arrays
  delete cross_branch_copies
  delete cross_branch_copies_first
}

/^Revision-number: / {
  revision = $0
  sub(/^Revision-number: /, "", revision)
  sub(/^\s+/, "", revision)
  sub(/\s+$/, "", revision)
  #
  nodepath = ""
  nodecopyrev = ""
  nodecopypath = ""
}

/^Node-path: / {
  nodepath = $0
  sub(/^Node-path: /, "", nodepath)
  #
  nodecopyrev = ""
  nodecopypath = ""
}

/^Node-copyfrom-rev: / {
  nodecopyrev = $0
  sub(/^Node-copyfrom-rev: /, "", nodecopyrev)
}

/^Node-copyfrom-path: / {
  nodecopypath = $0
  sub(/^Node-copyfrom-path: /, "", nodecopypath)
  #
  branchfrom = getBranchOfNodePath(nodecopypath)
  branchto = getBranchOfNodePath(nodepath)
  #
  if (branchfrom != branchto) {
    cross_branch_copies[revision][branchfrom][branchto][nodepath]["nodecopypath"] = nodecopypath
    cross_branch_copies[revision][branchfrom][branchto][nodepath]["nodecopyrev"] = nodecopyrev
    if (!((branchfrom, branchto) in cross_branch_copies_first)) {
      cross_branch_copies_first[branchfrom, branchto] = revision
    }
  }
}

ENDFILE {
  foundrevs = length(cross_branch_copies)
  if (foundrevs == 0) {
    printf("=== %s: No revisions found with cross-branch svn copies\n", FILENAME)
  } else {
    if (csv) {
      backupofs=OFS
    }
    printf("=== %s: Found %i revisions with cross-branch svn copies\n", FILENAME, foundrevs)
    if (csv) {
      OFS=csv
      print("\"Revision\"", "\"Branch from\"", "\"Branch to\"")
      OFS=backupofs
    }
    PROCINFO["sorted_in"] = "@ind_num_asc"
    for (revision in cross_branch_copies) {
      if (!(csv)) {
        print(">>> Revision:", revision)
      }
      count = 0
      PROCINFO["sorted_in"] = "@ind_str_asc"
      for (branchfrom in cross_branch_copies[revision]) {
        for (branchto in cross_branch_copies[revision][branchfrom]) {
          if (csv) {
            csvrevision = revision
            csvbranchfrom = "\"" gensub(/"/, "\"\"", "g", branchfrom) "\""
            csvbranchto = "\"" gensub(/"/, "\"\"", "g", branchto) "\""
            OFS=csv
            print(csvrevision, csvbranchfrom, csvbranchto)
            OFS=backupofs
          } else {
            printf(" svn copy from \"%s\" to \"%s\"\n", branchfrom, branchto)
            if (details) {
              for (nodepath in cross_branch_copies[revision][branchfrom][branchto]) {
                count++
                printf("  %4i. \"%s\" (Revision %i) to \"%s\"\n", count, cross_branch_copies[revision][branchfrom][branchto][nodepath]["nodecopypath"], cross_branch_copies[revision][branchfrom][branchto][nodepath]["nodecopyrev"], nodepath)
              } ## nodepath
            } ## details
          } ## csv
        } ## branchto
      } ## branchfrom
    } ## revision
    #
    printf("--- %s: List of first revision of each cross-branch copy\n", FILENAME)
    PROCINFO["sorted_in"] = "@val_num_asc"
    if (csv) {
      OFS=csv
      print("\"Revision\"", "\"Branch from\"", "\"Branch to\"")
      OFS=backupofs
    }
    for (combined in cross_branch_copies_first) { ## combined: branchfrom, branchto
      split(combined, separate, SUBSEP)
      if (csv) {
        csvrevision = cross_branch_copies_first[combined]
        csvbranchfrom = "\"" gensub(/"/, "\"\"", "g", separate[1]) "\""
        csvbranchto = "\"" gensub(/"/, "\"\"", "g", separate[2]) "\""
        OFS=csv
        print(csvrevision, csvbranchfrom, csvbranchto)
        OFS=backupofs
      } else {
        printf(" svn copy from \"%s\" to \"%s\" first in revision %i\n", separate[1], separate[2], cross_branch_copies_first[combined])
      } ## csv
    } ## combined
    #
    count = length(cross_branch_copies_first)
    printf("^^^ %s: Found %i revisions (unique %i) with cross-branch svn copies\n", FILENAME, foundrevs, count)
  } ## foundrevs
}

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux