Hi Eric et al., While using git-svn to work with a repository with a very complex history I discovered a very unfortunate behavior: In general when a branch was derived (copied) from somewhere else git-svn follows this parent branch and imports it. If multiple branches do that git-svn detects that the corresponding parrent branch already had been imported and reuses the imported data. Unfortunately when the parent directory in the svn repository is not tracked as a branch in the svn-remote section of the config file (for instance when it is just a subdirectory of a tracked branch) this situation is no longer detected and this parent branch is imported multiple times with the same result. In a large repository this can increase importing time drastically. My analysis (as far as I understand the code) is that this is because the map files in .git/svn are indexed by their ref name in the git repository. Untracked branches are indexed by the name of their following branch ref name followed by @XX where XX is the revision number of the branch point. Obviously with that scheme the index name for two branches following a common parent tree is different and thus an already imported tree is not correctly detected. My thoughts where now that this could potentially be fixed by not indexing those map files by their ref name in the git repository but by their location in the original svn repository. Given that my understanding of the git-svn code is not good enough to decide about all the consequences of such a design change I'd like to ask you whether you think this change would be a good idea or whether I might have overlooked a fundamental problem that makes it impossible (or at least hard) to implement this idea. Since my description of the problem might be a bit confusing without an example I created a very small svn repository that shows this problem. A svn repository dump for it is attached. When importing this repository using the svn-remote section [svn-remote "svn"] url = file:///dev/shm/x/svn1 fetch = trunk:refs/remotes/trunk branches = branches/*:refs/remotes/* tags = tags/*:refs/remotes/tags/* you will get the following behavior during the import: $ git svn init -s file:///dev/shm/x/svn1 Initialized empty Git repository in /dev/shm/x/git2/.git/ $ git svn fetch r1 = 7920f3e7e70c9bb9d8a7caf28830c7ed205c20c6 (refs/remotes/trunk) A x/alpha r2 = db7ad1b41f1d2ad18d198b9a80d2606b27557faf (refs/remotes/trunk) A x/beta r3 = a35cab9c510f66d96437f21ecb738c93e0c6b793 (refs/remotes/trunk) Found possible branch point: file:///dev/shm/x/svn1/trunk/x => file:///dev/shm/x/svn1/branches/foo1, 2 Initializing parent: refs/remotes/foo1@2 A alpha r2 = 5584693b5216dc1fa05f56455c67dfd61093ee43 (refs/remotes/foo1@2) Found branch parent: (refs/remotes/foo1) 5584693b5216dc1fa05f56455c67dfd61093ee43 Following parent with do_switch A beta Successfully followed parent r4 = d0cb7cfc1f69e52ecd39d8eb67518abe136b53d3 (refs/remotes/foo1) Found possible branch point: file:///dev/shm/x/svn1/trunk/x => file:///dev/shm/x/svn1/branches/foo2, 2 Initializing parent: refs/remotes/foo2@2 A alpha r2 = 5584693b5216dc1fa05f56455c67dfd61093ee43 (refs/remotes/foo2@2) Found branch parent: (refs/remotes/foo2) 5584693b5216dc1fa05f56455c67dfd61093ee43 Following parent with do_switch A beta Successfully followed parent r5 = 181cb81070b816bef74adefa1bc4c451100a5eef (refs/remotes/foo2) Checked out HEAD: file:///dev/shm/x/svn1/trunk r3 As you can see file:///dev/shm/x/svn1/trunk/x is imported twice. For this small repository this is not a big issue but when this tree had a deep history in a large repository you wanted to avoid that. Robert
Attachment:
svndump.gz
Description: GNU Zip compressed data
Attachment:
pgpxfkITHpQIY.pgp
Description: PGP signature