On Thu, Apr 19, 2007 at 09:43:50AM -0700, Linus Torvalds wrote: > On Thu, 19 Apr 2007, Johannes Schindelin wrote: > > Hmm. However, I have to say that cogito serves/d another purpose quite > > well: Look at what came from cogito into git. Loads of useful > > enhancements. So, I really have to point to "at this stage", because that > > sure was not true 18 months ago. > > Absolutely. I think there are still some pieces of cogito that we might > want to migrate into git too, although they're fairly esoteric (ie the > whole history rewriting thing). And I think we still have some places I actually have a fairly simple history rewriting script (written in python) that I used when I converted some CVS archives to git. It is really intended for such an initial import and history cleanup case so it doesn't deal with reflogs and such. Basic workflow I used is, - Import CVS archive into a git repository - Use gitk + the grafts file to clean up history as much as feasible - Run git-rewrite-history.py which will - write out new commit objects with the corrected set of parents - copy existing refs to .git/newrefs, pointing them at the new commits. - start gitk --all to see the tree before the rewrite. - mv .git/refs .git/oldrefs ; mv .git/newrefs .git/refs - start a second gitk --all to see the tree after the rewrite. - compare gitk output to check if everything matches up. - run git repack/prune/gc to get rid of the old commits, or clone the repo. Jan --8<----------------------------------------------------------------------- #!/usr/bin/python import os, sys def git_write_object(type, blob): stdin, stdout = os.popen2("git-hash-object -t %s -w --stdin" % type) stdin.write(blob) stdin.close() return stdout.readline().strip() def git_commits(branch): f = os.popen('git-rev-list --parents --header --topo-order %s' % branch) buf = '' while 1: buf = buf + f.read(4096) if not buf: break if not '\0' in buf: continue commit, buf = buf.split('\0', 1) yield Commit(commit) def git_update_ref(name, hash): os.system('git-update-ref "%s" "%s"' % (name, hash)) grafts = [] pending = [] rewriteable = [] remap = {} todo = 0 class Commit: def __init__(self, commit): global grafts lines = commit.split('\n') parts = lines.pop(0).split() self.hash, self.parents = parts[0], parts[1:] self.tree = lines.pop(0) parents = [] while lines[0][:7] == 'parent ': parents = parents + lines.pop(0).split()[1:] if parents != self.parents: grafts.append(self.hash) commit = [] while 1: line = lines.pop(0) commit.append(line) if not line: break for line in lines: commit.append(line[4:]) self.commit = '\n'.join(commit) self.wait = 0 self.children = [] def mark(self): global todo, pending self.wait = self.wait + 1 if self.wait == 1: todo = todo + 1 for child in self.children: pending.append(child.hash) def pick(self): global rewriteable self.wait = self.wait - 1 if not self.wait: rewriteable.append(self) def fixup(self, old_hash, new_hash): i = self.parents.index(old_hash) self.parents[i] = new_hash self.pick() def rehash(self): global todo, remap todo = todo - 1 blob = self.tree + '\n' for parent in self.parents: blob = blob + 'parent %s\n' % parent blob = blob + self.commit new_hash = git_write_object('commit', blob) remap[self.hash] = new_hash for child in self.children: child.fixup(self.hash, new_hash) print "Reading commits... ", commits = {} for commit in git_commits('--all'): commits[commit.hash] = commit print "read %d commits, found %d grafts" % (len(commits), len(grafts)) print "Setting up reverse linkage" for commit in commits.values(): for parent in commit.parents: commits[parent].children.append(commit) print "Propagating graft information... ", # first mark all commits that will have to be rewritten. for commit in grafts: commits[commit].mark() for commit in pending: commits[commit].mark() # pick those commits that do not depend on any earlier rewrites for commit in grafts: commits[commit].pick() print "%d commits need to be rewritten" % todo print "Rewriting commits... " while rewriteable: print "\rrewriting %5d/%5d commits" % (len(rewriteable), todo), rewriteable.pop().rehash() print "done..." print "Rewriting refs..." for ref in os.popen('git-for-each-ref'): hash, type, name = ref.split() if type != 'commit': continue if remap.has_key(hash): hash = remap[hash] # write updated refs to .git/newrefs git_update_ref('new' + name, hash) print "done..." - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html