Ian Clarke wrote: > Provide the merge algorithm with the grammar of the programming > language, perhaps in the form of a Bison grammar file, or some other > standardized way to represent a grammar. > > The merge algorithm then uses this to parse the files to be diffed > and/or merged into trees, and then the diff and merge are treated as > operations on these trees. These operations may include creating, > deleting, or moving nodes or branches, renaming nodes, etc. There has > been quite a bit (pdf) of academic research on this topic, although I > haven't yet found off-the-shelf code that will do what we need. > Still, it shouldn't be terribly hard to implement. There's a huge flaw in that approach for C/C++: in order to parse C/C++ you have to first preprocess it -- consider the twisty mazes that #ifdef/#else/#endif can create. But in order to preprocess source code you need a whole heap of extra information that is not in the repository (or if it is, cannot be automatically extracted.) For example, you'd have to know all the -D/-U/-I flags that the makefile or the user might pass to the compiler. You'd have to replicate the compiler's complicated header search path algorithm, which can depend on the directives in the code as well as command line arguments, environment variables, and values specific to the toolchain. (Don't forget that you can have code in a repository that's meant to be cross-compiled and which uses a toolchain that has its own headers and not the ones in /usr/include.) You'd have to know all the built-in predefined symbols of that toolchain, e.g. what's the value of __GNUC_MINOR__ or __GNUC_PATCHLEVEL__, is __mips__ or __i386__ defined, and on and on. And of course the natural conclusion of this progression: a change can be perfectly grammatically correct for one particular platform/toolchain/setting of CFLAGS, and completely broken for another. There's no way for a VCS to know any of this, it takes human comprehension. If you look at a tool like doxygen that attempts to parse C/C++, it don't actually do full preprocessing, only a very limited subset: it only expands macros that the user names as relevant in the config file, and it only preprocesses included headers that match a pathspec the user provides. Consequently it cannot fully parse the code to see if it's grammatically correct, only to the limited extent that it can infer the location where things appear to be defined. And it is easily confused, e.g. it will "see" code in both halves of an #ifdef section if it wasn't told anything about the value of the macro in the config file, which can cause it to incorrectly think that a function or variable was defined there when in reality that section was discarded. The idea may have value for langauges that are easy to parse and do not have all this preprocessor cruft, but I just don't see how it would be able to provide anything useful for non-trivial changes to real world C/C++, which require human eyes to decipher. Brian -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html