Re: [RFC] git-split: Split the history of a git repository by subdirectories and ranges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano wrote:
> Josh Triplett <josh@xxxxxxxxxxxxxxx> writes:
>> Jamey Sharp and I wrote a script called git-split to accomplish this
>> repository split. git-split reconstructs the history of a sub-project
>> previously stored in a subdirectory of a larger repository. It
>> constructs new commit objects based on the existing tree objects for the
>> subtree in each commit, and discards commits which do not affect the
>> history of the sub-project, as well as merges made unnecessary due to
>> these discarded commits.
> 
> Very nicely done.

Thanks!

>> We would like to acknowledge the work of the gobby team in creating a
>> collaborative editor which greatly aided the development of git-split.
> 
>> from itertools import izip
>> from subprocess import Popen, PIPE
>> import os, sys
> 
> How recent a Python are we assuming here?  Is late 2.4 recent
> enough?

We ran it with 2.4, so yes.  git-split does require at least 2.4,
though, because it uses set(), str.rsplit(), and subprocess, none of
which existed in 2.3.

>> def walk(commits, new_commits, commit_hash, project):
>>     commit = commits[commit_hash]
>>     if not(commit.has_key("new_hash")):
>>         tree = get_subtree(commit["tree"], project)
>>         commit["new_tree"] = tree
>>         if not tree:
>>             raise Exception("Did not find project in tree for commit " + commit_hash)
>>         new_parents = list(set([walk(commits, new_commits, parent, project)
>>                                 for parent in commit["parents"]]))
>>
>>         new_hash = None
>>         if len(new_parents) == 1:
>>             new_hash = new_parents[0]
>>         elif len(new_parents) == 2: # Check for unnecessary merge
>>             if is_ancestor(new_commits, new_parents[0], new_parents[1]):
>>                 new_hash = new_parents[0]
>>             elif is_ancestor(new_commits, new_parents[1], new_parents[0]):
>>                 new_hash = new_parents[1]
>>         if new_hash and new_commits[new_hash]["new_tree"] != tree:
>>             new_hash = None
> 
> This is a real gem.  I really like reading well-written Python
> programs.

Thanks.  We had some fun writing this; git's elegant repository
structure made it a joy to work with.

> I wonder if using "git-log --full-history -- $project" to let
> the core side omit commits that do not change the $project (but
> still give you all merged branches) would have made your job any
> easier?

I don't think it would.  We still need to know what commit to use as the
parent of any given commit, so we don't want commits in the log output
with parents that don't exist in the log output.  And rewriting parents
in git-log based on which revisions change the specified subdirectory
seems like a bad idea.

> You are handling grafts by hand because --pretty=raw is special
> in that it displays the real parents (although traversal does
> use grafts).  Maybe it would have helped if we had a --pretty
> format that is similar to raw but rewrites the parents?

Yes, that would help.  We could then avoid dealing with grafts manually.


How would you feel about including git-split in the GIT tree?  We could
certainly write up the necessary documentation for it.

- Josh Triplett

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]