git-union-merge proposal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Lately some tools are storing data in git branches or refs, that is not
source code, and that is designed in some way to be automatically
merged. Generally merge=union will work for it, but the problem is that
git-merge can only operate on the checked out branch, not other
branches.

So these things all deal with merging in their own ad-hoc ways:

* pristine-tar commits the info it needs to reconstruct tarballs
  to a pristine-tar branch; files in the branch should not easily conflict
  as each includes the name of the tarball.. but when git pull
  cannot fast-forward the pristine-tar branch, the user is left to
  manually fix it.
* git-annex stores location tracking information to log files in
  .git-annex/; gitattributes is configured to use merge=union,
  and the log files have timestamps or are otherwise structured to be
  safely merged.
* git notes merge -s cat_sort_uniq
  Notes are stored in a tree using the object sha, which can be
  union merged, when the notes' format is a series of independant lines.
* probably other tools do things like this too, or will ...

So I've written a prototype of a git-union-merge that could be used
for all of these. It works like this:

git union-merge foo origin/foo refs/heads/foo 

That looks up foo and origin/foo and union merges them together,
producing the new branch refs/heads/foo. New blobs are injected
as needed for unioned files, and the merge commit is generated,
without affecting the current working tree, and without any
expensive checkouts of the branches. It's pretty fast, it only
needs to write out a temporary index file.

Prototype is attached, I doubt it would be suitable for git as-is,
but it does show how this is accomplished, if you've not already
seen how to do it -- just look for ls-tree, diff-tree,
show, hash-object, and update-index. Note that merging file modes is
not yet dealt with.

I imagine a git that can have union merge or other custom automated
merge strategies configured on a per-branch basis, so that git pull
automatically merges branches. That could be a good basis for adding
Fossil-like features to git.

-- 
see shy jo
{- git-union-merge program
 -
 - Copyright 2011 Joey Hess <joey@xxxxxxxxxxx>
 -
 - Licensed under the GNU GPL version 3 or higher.
 -}

import System.Environment
import System.FilePath
import System.Directory
import System.Cmd.Utils
import System.Posix.Env (setEnv)
import Control.Monad (when)
import Data.List
import Data.Maybe
import Data.String.Utils

import qualified GitRepo as Git
import Utility

header :: String
header = "Usage: git-union-merge ref ref newref"

usage :: IO a
usage = error $ "bad parameters\n\n" ++ header

main :: IO ()
main = do
	[aref, bref, newref] <- parseArgs
	g <- setup
	stage g aref bref
	commit g aref bref newref
	cleanup g

parseArgs :: IO [String]
parseArgs = do
	args <- getArgs
	if (length args /= 3)
		then usage
		else return args

tmpIndex :: Git.Repo -> FilePath
tmpIndex g = Git.workTree g </> Git.gitDir g </> "index.git-union-merge"

{- Configures git to use a temporary index file. -}
setup :: IO Git.Repo
setup = do
	g <- Git.configRead =<< Git.repoFromCwd
	cleanup g -- idempotency
	setEnv "GIT_INDEX_FILE" (tmpIndex g) True
	return g

cleanup :: Git.Repo -> IO ()
cleanup g = do
	e' <- doesFileExist (tmpIndex g)
	when e' $ removeFile (tmpIndex g)

{- Stages the content of both refs into the index. -}
stage :: Git.Repo -> String -> String -> IO ()
stage g aref bref = do
	-- Get the contents of aref, as a starting point.
	ls <- fromgit
		["ls-tree", "-z", "-r", "--full-tree", aref]
	-- Identify files that are different between aref and bref, and
	-- inject merged versions into git.
	diff <- fromgit
		["diff-tree", "--raw", "-z", "-r", "--no-renames", "-l0", aref, bref]
	ls' <- mapM mergefile (pairs diff)
	-- Populate the index file. Later lines override earlier ones.
	togit ["update-index", "-z", "--index-info"]
		(join "\0" $ ls++catMaybes ls')
	where
		fromgit l = Git.pipeNullSplit g (map Param l)
		togit l content = Git.pipeWrite g (map Param l) content
			>>= forceSuccess
		tofromgit l content = do
			(h, s) <- Git.pipeWriteRead g (map Param l) content
			length s `seq` do
				forceSuccess h
				Git.reap
				return ((), s)

		pairs [] = []
		pairs (_:[]) = error "parse error"
		pairs (a:b:rest) = (a,b):pairs rest
		
		nullsha = take shaSize $ repeat '0'
		ls_tree_line sha file = "100644 blob " ++ sha ++ "\t" ++ file
		unionmerge = unlines . nub . lines
		
		mergefile (info, file) = do
			let [_colonamode, _bmode, asha, bsha, _status] = words info
			if bsha == nullsha
				then return Nothing -- already staged from aref
				else mergefile' file asha bsha
		mergefile' file asha bsha = do
			let shas = filter (/= nullsha) [asha, bsha]
			content <- Git.pipeRead g $ map Param ("show":shas)
			sha <- getSha "hash-object" $
				tofromgit ["hash-object", "-w", "--stdin"] $
					unionmerge content
			return $ Just $ ls_tree_line sha file

{- Commits the index into the specified branch. -}
commit :: Git.Repo -> String -> String -> String -> IO ()
commit g aref bref newref = do
	tree <- getSha "write-tree"  $
		pipeFrom "git" ["write-tree"]
	sha <- getSha "commit-tree" $
		pipeBoth "git" ["commit-tree", tree, "-p", aref, "-p", bref]
			"union merge"
	Git.run g "update-ref" [Param newref, Param sha]

{- Runs an action that causes a git subcommand to emit a sha, and strips
   any trailing newline, returning the sha. -}
getSha :: String -> IO (a, String) -> IO String
getSha subcommand a = do
	(_, t) <- a
	let t' = if last t == '\n'
		then take (length t - 1) t
		else t
	when (length t' /= shaSize) $
		error $ "failed to read sha from git " ++ subcommand ++ " (" ++ t' ++ ")"
	return t'

shaSize :: Int
shaSize = 40

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]