On Jun 2, 2008, at 3:42 AM, Eric Wong wrote:
Kevin Ballard <kevin@xxxxxx> wrote:
On Jun 1, 2008, at 10:40 PM, Eric Wong wrote:
Kevin Ballard <kevin@xxxxxx> wrote:
On Jun 1, 2008, at 10:00 PM, Eric Wong wrote:
Kevin Ballard <kevin@xxxxxx> wrote:
I started a git-svn clone on a large svn repository, and I
noticed
that for various branches, it kept pulling down the exact same
revisions (starting at r1). In other words, if I had 4 branches
that
shared common history, their common history all got pulled down 4
times. I double-checked, and the created commit objects were
identical.
Why was git-svn pulling down the same revisions over and over,
when
it
already knows it has a commit object for those revisions?
Can you give me an example if a repository and command-line you
used
that does this? Did you use 'git svn clone -s' or did you
manually
specify the branch locations in the repo?
It could even be a lack of read permissions to the repository root
that would cause things like this.
The repository is, unfortunately, a private repo so I can't share
it.
I used `git svn clone -s` to clone it. I have the SVN perl bindings
v1.4.4 (according to git svn --version).
I definitely have read permissions to the repo root. If I specify
to
only fetch -r 12000:HEAD (there's 14000-odd revisions), it doesn't
pull down any duplicates, but when I let it start from the root, it
pulls down hundreds of duplicates for multiple branches.
Can you at least send me the 'svn log -v' output for that repo?
Feel free to leave out the actual log messages and munge the path
names if you can't expose that information.
I'll have to do it tomorrow when I'm at the office. How much log info
do you need? I can let it run until I see duplicate revisions (it's
pretty obvious, it starts over again from r1).
I'll need the revisions where branches were created from
the common ancestor (presumably trunk) and some revisions
before it.
For debugging problems with restricted repositories, it may be worth
it
to create a repository skeleton cloning tool that just reads the
output
of 'svn log --xml -v' and recreates a new SVN repository with:
* all log messages stripped
* all new files are created with just a random string in them (to
throw off rename detection on the git side)
(except symlinks, see below)
* all path components tokenized and each token replaced with
a dictionary value. Something like:
@tmp = map { $tok{$_} ||= ++$i; $tok{$_} } split(/\//, $old_path);
$new_path = join('/', @tmp);
This way all copy history can be preserved
* all modified files will just get a random byte appended to them
* all committer names replaced with a dictionary value (similar to
what is done to path components).
Isn't there a script somewhere that's supposed to do this? Do you know
where it is?
Incidentally, I just checked and when I start the git-svn clone, it
starts pulling down revisions for the branch 'css_refactor@1559' (odd
branch name, but it claimed to find multiple branch points for this
'css_refactor' branch). My guess is when it starts working on the next
branch, it doesn't view it as related to css_refactor and starts
pulling down the revisions again even though those revisions actually
belonged to trunk.
-Kevin
--
Kevin Ballard
http://kevin.sb.org
kevin@xxxxxx
http://www.tildesoft.com
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html