On Sat, 10 Jan 2009, Davide Libenzi wrote: > On Sat, 10 Jan 2009, Jakub Narebski wrote: >> On Sat, 10 Jan 2009, Johannes Schindelin wrote: >>> On Sat, 10 Jan 2009, Jakub Narebski wrote: >>>> Thomas Rast wrote: >>>> >>>>> --color-words works (and always worked) by splitting words onto one >>>>> line each, and using the normal line-diff machinery to get a word >>>>> diff. >>>> >>>> Cannot we generalize diff machinery / use underlying LCS diff engine >>>> instead of going through line diff? >>> >>> What do you think we're doing? libxdiff is pretty hardcoded to newlines. >>> That's why we're substituting non-word characters with newlines. >> >> Isn't Meyers algorithm used by libxdiff based on LCS, largest common >> subsequence, and doesn't it generate from the mathematical point of >> view "diff" between two sequences (two arrays) which just happen to >> be lines? It is a bit strange that libxdiff doesn't export its low >> level algorithm... > > The core doesn't know anything about lines. Only pre-processing (setting > up the hash by tokenizing the input) and post-processing (adding '\n' to > the end of each token), knows about newlines. Memory consumption would > increase significantly though, since there is a per-token cost, and a > word-based diff will create more of them WRT the same input. Is this core algorithm available as some exported function in libxdiff? I mean would it be easy to replace default line tokenizer (per-line pre-processing) and post-processing to better deal with word diff? The other side would be to generate per-paragraph diffs (with empty line being separator)... -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html