Hi git hackers,
I have been scratching my head since quite a few weeks to see if and how
I could hack git to manage non-software-source-code files. Theses files
might be text-based (XML, JSON, custom format, ...) but are not intended
for humans, thus diffing and merging them using standard git features
doesn't really make sense (and so the whole "pack" stuff seems useless
as well). These files represent a non-software project developed using a
graphical SW application. I'm talking here about designing and
simulating electronic projects, but it could be apply to any sort of
engineering (mechanical design comes second to me)
I would like to provide support for diffing, merging, branching and
forking such electronics projects.
I know, that git is not a conventional SCM software, and as such doesn't
rely on incremental diff (like CVS, SVN, ...), but...
My graphical software uses a document/command based approach, that is,
it doesn't directly transform user interaction into graphical changes,
instead graphical tools generates commands that are then executed on a
document, which once completed cause the graphical view to update it's
content.
So far, in my context, a document is simply a tree of objects, the
lowest commands available are:
- Insert an object in the tree.
- Remove an object from the tree.
- Modify an object property.
All higher level commands are build in term of the above basic commands.
This is, IMHO, an "interesting" feature in the context of traditional
SCMs. Instead of storing incremental diff, I could store incremental
commands (I know it would be dead slow, but it would definitely works)
Since git is simply a "content addressable" file system, I can (using
plumbing commands) create my own system to store my machine-readable
project: a tree of documents, documents being themselves tree of
objects. This fit pretty well with git commit, tree and blob objects.
I could even store a serialised command stack (as a tree of command
objects, again git fits very well here) along with a commit. This would
represent the set of operations (I call this a document transaction) to
transform the git document tree associated with the previous commit into
the git document tree associated with the current commit.
I feel very confident that I could create wrappers around git plumbing
commands to implement my 3 basics document commands (that would work on
the index):
mygit insert-object <document> <object-id>
mygit remove-object <document> <object-id>
mygit change-object <document> <object-id> <property-id> <property-value>
Of course, for this to work "mygit" needs to be aware of the low-level
file format (XML, JSON, ...), but "mygit" doesn't need to know how to
interpret the whole document.
Storing my document transactions in git would definitely help with
merging (automatic or manual) and diffing, since document transaction
would have some extra meta-data that tells what the user really did and
why it did it, hence giving hints to the algorithm or the end user on
how to solve a merge conflict for example.
Now, from there, I don't know what would be the best approach for
diffing and merging, should I completely replace the git pack, diff and
merge feature? Should I rely on my concept of command and document
transaction? Maybe I should keep the pack feature and simply implement
diff and merge using "clever" algorithm? (Just by looking at 2 versions
of a document, the algorithm is able to detect what was the purpose of
the change and replay it on top of another document version)
I'm pretty sure I'm not the first person to investigate into this, I
would be glad if anyone could provide feedback from their own
experience, advice on how to move next or simply provides criticism or
points out to literature or existing projects.
Thanks,
Chris
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html