Re: Best practices for updating old repos

Michael Eager <eager@xxxxxxxxxx> · Thu, 15 Jun 2017 23:24:20 -0700

Thanks for your comments.

On 06/15/2017 07:57 PM, Michael O'Cleirigh wrote:
Hi Michael,

In git if you don't merge often then you get these merge conflict hell situations.

In my experience the main conflicts come not from the unified diff of those 130 commits but from
differences in the surrounding code.

Merging/rebase/cherrypicking directly to the latest upstream sounds impossible to me.

These conflicts come from the distance between the local fork branch and the upstream branch.

You need to merge through closer commits first to have a hope of getting something automatic to work.

Something like getting the list  of releases made in the upstream in the last 5 years and merging
them in order into the fork branch.

i.e. merge v1, merge v2, ... merge v300

I went through something similiar with a subversion repo we converted to git.

In subversion they were cherry picking done work into a release branch.

In git a feature branch mode was being used.

It turned out some commits were never cherry picked and bringing them to the latest release was hard.

We tried many of the approaches you outlined, took what git would give us automatically and in the
most hairy cases recreated the changes on the latest upstream by reading the diff of the original
commit and rewriting it on the latest code.

In terms of how the history looks after the merge conflicts are resolved you could internalize the
fixups into a single commit applied onto the original fork branch.  So that history would show the
130 commit branch directly merged into the upstream.

You would use the git-commit-tree command to reuse the merged tree id and then use it as a merge
commit between the 130th commit id and the upstream commit id.

Regards,

Michael

On Thu, Jun 15, 2017 at 8:52 PM, Michael Eager <eager@xxxxxxxxxx <mailto:eager@xxxxxxxxxx>> wrote:

    Hi All --

    I'm working with code that is based on a five year old repository.
    There are 130 local commits since the repo was forked.  Naturally,
    the upstream project has moved on significantly.

    I'm wondering about best approaches to updating the repo to the
    current upstream version.  Here are the approaches I've considered:

    - Rebase from upstream.  Likely almost every patch will fail with
       multiple merge conflicts.

    - Merge local branch into upstream.  Likely many merge failures, but
       fewer than with rebase.

    - Apply individual patches from the old repo to the upstream repo.
       Fix merge conflicts, rebuild, fix build failures.  There may be
       some duplication and additional merge problems created, where a
       later patch from the old repo fixes the same conflict or build
       failure.

    I've tried each of these approaches on various projects.  Each has
    problems. After resolving merge issues there are build failures which
    need to be resolved and additional patches created.  The result is
    that the patch history is a bit chaotic, where there are later patches
    which fix problems with early patches.  I've tried to sort the fix
    patches to follow the patch they correct, so that the fixes were
    together and I could merge them, but that can be difficult.

    I've used Stacked Git a little, but don't know if it will make
    any of this easier.

    On some projects, I've reimplemented changes in the upstream repo,
    abandoning the patch history from the old repo:

    - Create diff of old repo and upstream.  Apply only the changes
       to add new functionality, which are in the patches to the
       old repo.   Fix problems caused by API changes, renamed files, etc.

    - Re-implement the changes on the upstream repo.  Some of the old
       code would be re-used, but modified to fit in the current upstream.
       Some new code would be written.

    One other variant of the rebase approach I've thought of is to do
    this incrementally, rebasing the old repo against an upstream commit
    a short time after the old repo was forked, fixing any conflicts,
    rebuilding and fixing build failures.  Then repeat, with a bit
    newer commit.  Then repeat, until I get to the top.  This sounds
    tedious, but some of it can be automated.  It also might result in
    my making the changes compatible with upstream code which was later
    abandoned or significantly changed.

    Anyone have a different approach that I should consider?  Or maybe
    offer advice on how to make one of these approaches work better?
    What is best practice to update an old repo?

    --
    Michael Eager eager@xxxxxxxxxxxx <mailto:eager@xxxxxxxxxxxx>
    1960 Park Blvd., Palo Alto, CA 94306 650-325-8077 <tel:650-325-8077>

--
Michael Eager	 eager@xxxxxxxxxxxx
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077