Hacking git for managing machine readable "source" files

Christian Gagneraud <chgans@xxxxxxx> · Mon, 12 Oct 2015 16:32:42 +1300

Hi git hackers,

I have been scratching my head since quite a few weeks to see if and how 
I could hack git to manage non-software-source-code files. Theses files 
might be text-based (XML, JSON, custom format, ...) but are not intended 
for humans, thus diffing and merging them using standard git features 
doesn't really make sense (and so the whole "pack" stuff seems useless 
as well). These files represent a non-software project developed using a 
graphical SW application. I'm talking here about designing and 
simulating electronic projects, but it could be apply to any sort of 
engineering (mechanical design comes second to me)

I would like to provide support for diffing, merging, branching and 
forking such electronics projects.

I know, that git is not a conventional SCM software, and as such doesn't 
rely on incremental diff (like CVS, SVN, ...), but...

My graphical software uses a document/command based approach, that is, 
it doesn't directly transform user interaction into graphical changes, 
instead graphical tools generates commands that are then executed on a 
document, which once completed cause the graphical view to update it's 
content.

So far, in my context, a document is simply a tree of objects, the 
lowest commands available are:
- Insert an object in the tree.
- Remove an object from the tree.
- Modify an object property.
All higher level commands are build in term of the above basic commands.

This is, IMHO, an "interesting" feature in the context of traditional 
SCMs. Instead of storing incremental diff, I could store incremental 
commands (I know it would be dead slow, but it would definitely works)

Since git is simply a "content addressable" file system, I can (using 
plumbing commands) create my own system to store my machine-readable 
project: a tree of documents, documents being themselves tree of 
objects. This fit pretty well with git commit, tree and blob objects.

I could even store a serialised command stack (as a tree of command 
objects, again git fits very well here) along with a commit. This would 
represent the set of operations (I call this a document transaction) to 
transform the git document tree associated with the previous commit into 
the git document tree associated with the current commit.

I feel very confident that I could create wrappers around git plumbing 
commands to implement my 3 basics document commands (that would work on 
the index):
mygit insert-object <document> <object-id>
mygit remove-object <document> <object-id>
mygit change-object <document> <object-id> <property-id> <property-value>
Of course, for this to work "mygit" needs to be aware of the low-level 
file format (XML, JSON, ...), but "mygit" doesn't need to know how to 
interpret the whole document.
Storing my document transactions in git would definitely help with 
merging (automatic or manual) and diffing, since document transaction 
would have some extra meta-data that tells what the user really did and 
why it did it, hence giving hints to the algorithm or the end user on 
how to solve a merge conflict for example.

Now, from there, I don't know what would be the best approach for 
diffing and merging, should I completely replace the git pack, diff and 
merge feature? Should I rely on my concept of command and document 
transaction? Maybe I should keep the pack feature and simply implement 
diff and merge using "clever" algorithm? (Just by looking at 2 versions 
of a document, the algorithm is able to detect what was the purpose of 
the change and replay it on top of another document version)

I'm pretty sure I'm not the first person to investigate into this, I 
would be glad if anyone could provide feedback from their own 
experience, advice on how to move next or simply provides criticism or 
points out to literature or existing projects.

Thanks,
Chris
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html