On Fri, Jun 27, 2014 at 10:48 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Even though the original question mentioned "delta discovery", I > think what was being asked is not "delta" in the Git sense (which > your answer is about) but is "can we diff two long sequences of text > (that happens to consist of only 4-letter alphabet but that is a > irrelevant detail) without holding both in-core in their entirety?", > which is a more relevant question/desire from the application point > of view. .. even there, there's another issue. With enough memory, the diff itself should be fairly reasonable to do, but we do not have any sane *format* for diffing those kinds of things. The regular textual diff is line-based, and is not amenable to comparing two long lines. You'll just get a diff that says "the two really long lines are different". The binary diff option should work, but it is a horrible output format, and not very helpful. It contains all the relevant data ("copy this chunk from here to here"), but it's then shown in a binary encoding that isn't really all that useful if you want to say "what are the differences between these two chromosomes". I think it might be possible to just specify a special diff algorithm (git already supports that, obviously), and just introduce a new "use binary diffs with a textual representation" model. But it also sounds like there might be some actual performance problem with these 1GB file delta-calculations. Which I wouldn't be surprised about at all. Jarrad - is there any public data you could give as an example and for people to play with? Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html