Re: git-svn: ignoring a bogus svn revision ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, May 3, 2009 at 12:24, Nicolas Noble <pixel.nobis@xxxxxxxxx> wrote:
> Hello,
>
>  I'm a frantic user of git-svn at work, as my company uses SVN as a
> SCM for all of our projects. I've been very happily using git-svn over
> our repositories since quite some time, and I think I managed to
> convert a couple of colleagues by doing so.
>
>  But now I'm facing an interesting issue. Our biggest repository has
> over 102600 revisions, and initializing from scratch the git
> repository up to this revision takes approximately 2 days of computer
> work. The trouble begins with revision 102601... Up so far the
> repository was in a perfect shape, and no one did any huge mistake. On
> revision 102601 however, someone branched... /. Not /trunk, but /,
> which means the whole nine yards, containing tags, branches, and
> trunk.
>
>  So I'm stuck with this:
>
> repository-up-to-102600$ git branch -r | wc -l
> 1824
>
>  And when I issue a git-svn fetch --no-follow-parent --revision
> 102601, git-svn starts trying to fetch all of these 1824 branches and
> tag, obviously. I had to add the --no-follow-parent otherwise git-svn
> would just go nuts. Now if it wasn't for the loosy svn server
> disconnecting me after 2 or 3 days of work trying to check out this
> branch, git-svn would probably manage to get over it. But the server
> can't keep up it seems, and I eventually get disconnected, which means
> I have to do it all over again.
>
>  Of course, this svn commit is undeniably bogus, and no one will ever
> be able to check it out and work on it for real. However, this doesn't
> disrupt the usual course of work for other SVN users, but just
> prevents anyone from using git-svn ever again on this repository.
>
>  I've tried to look into how to remove/amend this svn revision
> directly onto the server, but even if some documentation was telling
> me how to do so (which isn't the case), our IS crew is just too
> stubborn, and fills my requests back with "Please use a SVN client
> supported by the IS crew; you can find links on this page...". I'm
> still going to try to go through this path though, as I feel this is
> probably the right thing to do, (apart holding my boss hostage in his
> office until they finally decide to switch over to git completely, and
> stop using a piece of software that allows you to push bogus commits)
>
>  So on the other hand, I'm looking at how git-svn works, and I'm
> trying to jump around this utterly bogus svn revision. I haven't found
> any information at all about how I could skip this revision, and even
> though I'm all willing to make some tests on the git repository itself
> by tweaking it, I'd hate to leave it in a broken state that would
> explode in my hands several weeks/month later. Thus I think I'd better
> ask the authors directly, in order to know how I could achieve that in
> a way that isn't too disruptive.

I should probably say now that it's a good idea to back up your git
repository, by making a plain old directory copy, before trying any of
the stuff here; especially given that you're up to 100000 revisions, I
wouldn't want you to have to wait two more days because I gave bad
advice.  Using cp is necessary to back up the metadata related to
git-svn which isn't transferred through a normal git clone.

Fortunately there's a relatively painless way to avoid the bad svn
revision, by using the -r$m:$n flag to git svn fetch, where $m and $n
are integers.  If you do e.g.

$ git svn fetch -r1:102600
$ git svn fetch -r102602:HEAD

git-svn will happily skip over the bad revision 102601, and fetch
102602 up to the latest svn revision, creating a history with the same
tree content but skipping that commit.  From a git mindset, the
general case of this is very much like squashing several commits in to
one- the resultant tree is the same but some history has been omitted
or 'cleaned up'.  Note that HEAD on the command line here is *not* the
same as git's HEAD; it is rather an svn keyword meaning the latest
revision number.

There are potentially two complications with this approach; neither is
very serious, and the bottom line is that you might have to continue
doing a command like the above to fetch, with an explicit revision
range, instead of omitting them entirely.

The first complication is that your svn repository will most likely
revert commit 102601 by deleting the offending branch in a later
commit.  In this case, you have not one, but two bad revisions to
avoid.  Not to worry- one can simply grab the range in between the two
bad commits as illustrated above; suppose the revert happened in
commit 102644:

$ git svn fetch -r1:102600 # up to bad branch creation
$ git svn fetch -r102602:102643 # between bad create and delete
$ git svn fetch -r102645:HEAD # up to most recent

Note that the revision numbers specified with -r are inclusive on both
ends, that is, you'll get commits corresponding to both the low and
high revision number (if the commits are interesting to your view of
svn).  Additionally, you should be able to see how to generalise this
from here, for n arbitrary bad commits.

The second complication, and I only mention this because I can't test
the behaviour right now, is that even after fetching up past the bad
revision(s) to svn HEAD, you may need to continue to specify an
explicit revision range to avoid the bad commit(s).  This is because
git-svn tries to fetch from the revision of the latest commit it
obtained from svn when you don't specify an explicit revision range.
Consequently, if none of the revisions after the bad commit(s) were
interesting to you (perhaps they were on other projects in the same
svn repo, or in branches you aren't interested in), then when you go
to run git svn fetch again, your repository hasn't 'recorded' any
history which omits the bad commits, since there hasn't been anything
to fetch yet.  In this case, you need to continue to use the explicit
revision numbers, as appropriate, until you have actual commits
corresponding to later revisions than the bad ones.

Actually, it is more strict than that- to be safe, use the explicit
revision numbers until your git view of svn trunk has a commit after
the bad revision number(s).  In other words, it's not enough for some
branches to get commits which come after the bad revision; until trunk
moves forward, you should continue using the explicit revision range.
It's possible I'm mistaken on this necessity, but better safe than
sorry; if you want to experiment on your repository, feel free.

Other things to consider when doing this, perhaps more for others
reading along: I strongly discourage doing this for anything but
pathological cases like yours; that is, don't be tempted to tidy up
svn history by skipping commits, perhaps saying to yourself 'but the
trees end up the same, and those svn users probably would've squashed
these commits themselves if only they were using a DVCS'.  You will
regret it.  Doing so would be akin to cloning a public git repository,
squashing already-public commits together, and expecting others to
follow how your clone and the origin are related.  Skipping svn
revisions should rather be viewed as a nuclear option, used only when
there's really truly no other way to progress, like your case.

Let me say it again, because I've seen plenty of people on IRC who
think that git-svn is hanging, when it's not, and I don't want them to
think that the way around this supposed hang is to skip the nasty
revision: Don't do it unless you're absolutely sure you need to.

Hope that helps,

Deskin Miller
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]