Re: Transform log message during migration svn -> git (using git-svn)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 20, 2017 at 02:46:22PM +0200, Lars Schneider wrote:

> 
> > On 20 Jun 2017, at 14:32, <paul.mattke@xxxxxxx> <paul.mattke@xxxxxxx> wrote:
> > 
> > Well this is a possibility, of course. Our problem is that our SVN
> > repository contains about 220.000 revisions currently. As a colleague of
> > mine said that the command you suggest might take about 4 seconds per
> > revision, it would take about 10 days to do this for our whole repository.
> > So of course it could save a lot of time generally if such operation could
> > be done immediately during git-svn.
> 
> You colleague is most likely correct. I suggested it as this is a one time
> operation and therefore still somewhat practical from my point of view.

I didn't follow this whole thread, but I happened to see this bit. I
think the command in question is:

  git filter-branch -f --msg-filter 'perl -lape "s/^T(\d+)/#\$1/"'

I know filter-branch is slow, but a msg-filter should be relatively
fast.  I'd be surprised at 4 seconds per revision (the main cost is
kicking off a new perl process per revision). It's more like 120/sec on
my machine.

However, I think the fastest way would be to do it with fast-export,
where you can just tweak the stream as it flows through:

  # set up a new repo to hold the results; we won't bother
  # copying the blobs, so just point at the current repo as an
  # alternate.
  git init fixed-repo
  echo "../../../.git/objects" >fixed-repo/.git/objects/info/alternates

  git fast-export --no-data --all |
  perl -ne '
	# look for "data" chunks which contain the commit message
	if (/^data (\d+)/) {
		read STDIN, my $buf, $1;
		$buf =~ s/^T(\d+)/#$1/;
		print "data ", length($buf), "\n";
		print $buf;
	} else {
		print;
	}
  ' |
  git -C fixed-repo fast-import

That runs at about 3600 commits/sec on my machine.

Most of that time goes to doing a tree diff on each commit. Technically
that is not required for your use case, but I don't think there's a way
to get fast-export to skip that (and it's an inherent part of the
fast-import stream). It's probably fast enough, but it's possible that
a specialized tool like BFG repo cleaner[1] could do better (I don't
know offhand if it handles commit message rewrites or not).

-Peff

[1] https://rtyley.github.io/bfg-repo-cleaner/



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux