Re: git-fast-export bug, commits emmitted in incorrect order causing parent data to be lost from commits turning essentially linear repo into "islands"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 12, 2008 at 02:52:40PM +0200, Michael J Gruber wrote:
> Yves Orton venit, vidit, dixit 12.06.2008 14:16:
>> We want a more or less linear repo as the result. This bug with
>> fast-export was the main showstopper in our efforts.  However, I can
>> imagine that this is a problem that many people will want to solve. It
>> would be nice if there was an easier way to do it that what we currently
>> are doing (merging and munging multiple fast-export streams into a
>> single fast-import process). While at this point its probably academic
>> any suggestions as to the Best Way to do this would be very much
>> welcome.
>
> I've done something like this, "stitching" the history of different  
> repos together in order to produce one repo, with each of the  
> constituents in a subdir. What I did was an adaption of
>
> http://www.kernel.org/pub/software/scm/git/docs/howto/using-merge-subtree.html
>
> but as a multistep version:

What we did with Yves was a script doing the following:
- run git fast-export --all (and --topo-order now) on all the repositories
  we wanted to merge and read blocks from them
- pass through all non-commit blocks (munging paths to put the content of
  each repo in its own directory and renumbering marks to avoid clashes)
- keep a list of the next commit sent by fast-export for each repo
- select the oldest commit, and send it through, after stitching in the
  right place (the point being to determine the "right place")

Actually, what we are trying to do is produce a single DAG from 2 or
more DAGs, while making sure that each "internal DAG" is the same.
(I'm pretty sure this is all trivial stuff for graph mathematicians)

Imagine we merged repositories A, B and C in a new repo D, if we replace
all nodes from D coming from B and C by vertexes, we will end up with
the original A graph.

We defined the "right place" as so: when having selected the next commit
to add to our new graph, each of its new parents is defined by "the last
alien child of the original parent" (or the original parent itself).

For example, if our new repository being built looks like:

 --A7--A8--B4--B5--A9--B7
            \
             --B6

In this case, A9 was originally attached to A8, but to avoid unnecessary
branching in the new repo, we didn't attach it to A8, but to B5 (last
alien child of A8, descending the tree in a leftmost manner).

No A node will ever be attached to B6. The next A node originally
attached to A8 will be attached to B5 again, and one originally attached
to A9 will be attached to B7. Like this:

                 --A10
                /
 --A7--A8--B4--B5--A9--B7--A11
            \
             --B6

Now, if we remove all B nodes, we get this:

         --A10
        /
 --A7--A8--A9--A11

which is the original A graph.

Finding the "last alien child" works fine with merges, too.

Of course, some commits from A might end up on an unrelated branch of B,
but all B branches are irrelevant to A anyway! :-)

-- 
 Philippe Bruhat (BooK)

 People are all unique- but some are more unique than others.
                                    (Moral from Groo The Wanderer #22 (Epic))
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux