Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Josh Triplett <josh@xxxxxxxxxxxxxxx> writes:

> Jamey Sharp and I wrote a script called git-split to accomplish this
> repository split. git-split reconstructs the history of a sub-project
> previously stored in a subdirectory of a larger repository. It
> constructs new commit objects based on the existing tree objects for the
> subtree in each commit, and discards commits which do not affect the
> history of the sub-project, as well as merges made unnecessary due to
> these discarded commits.

Very nicely done.

> We would like to acknowledge the work of the gobby team in creating a
> collaborative editor which greatly aided the development of git-split.

> from itertools import izip
> from subprocess import Popen, PIPE
> import os, sys

How recent a Python are we assuming here?  Is late 2.4 recent
enough?

> def walk(commits, new_commits, commit_hash, project):
>     commit = commits[commit_hash]
>     if not(commit.has_key("new_hash")):
>         tree = get_subtree(commit["tree"], project)
>         commit["new_tree"] = tree
>         if not tree:
>             raise Exception("Did not find project in tree for commit " + commit_hash)
>         new_parents = list(set([walk(commits, new_commits, parent, project)
>                                 for parent in commit["parents"]]))
>
>         new_hash = None
>         if len(new_parents) == 1:
>             new_hash = new_parents[0]
>         elif len(new_parents) == 2: # Check for unnecessary merge
>             if is_ancestor(new_commits, new_parents[0], new_parents[1]):
>                 new_hash = new_parents[0]
>             elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
>                 new_hash = new_parents[1]
>         if new_hash and new_commits[new_hash]["new_tree"] != tree:
>             new_hash = None

This is a real gem.  I really like reading well-written Python
programs.

When git-rev-list (or "git-log --pretty=raw" that you use in
your main()) simplifies the merge history based on subtree, we
look at the merge and if the tree matches any of the parent we
discard other parents and make the history a single strand of
pearls.  However for this application that is not what you want,
so I can see why you run full "git-log" and prune history by
hand here like this.

I wonder if using "git-log --full-history -- $project" to let
the core side omit commits that do not change the $project (but
still give you all merged branches) would have made your job any
easier?

You are handling grafts by hand because --pretty=raw is special
in that it displays the real parents (although traversal does
use grafts).  Maybe it would have helped if we had a --pretty
format that is similar to raw but rewrites the parents?


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]