Junio C Hamano <gitster@xxxxxxxxx> writes: > Jeff King <peff@xxxxxxxx> writes: > >> So ideally we'd have an input format that encapsulates that extra >> context data and provides some mechanism for quoting. And it turns out >> we do: the --raw diff format. > > Funny. The raw diff format indeed was designed as an interchange > format from various "compare two sets of things" front-ends (like > diff-files, diff-cache, and diff-tree) that emits the raw format, to > be read by "diff-helper" (initially called "diff-tree-helper") that > takes the raw format and > > - matches removed and added paths with similar contents to detect > renames and copies > > - computes the output in various formats including "patch". > > So I guess we came a full circle, finally ;-). Looking in the archive > for messages exchanged between junkio@ and torvalds@ mentioning diff > before 2005-05-30 finds some interesting gems. > > https://lore.kernel.org/git/7v1x8zsamn.fsf_-_@xxxxxxxxxxxxxxxxxxxxxxxx/ So, if we were to do what Justin tried to do honoring the overall design of our diff machinery, I think what we can do is as follows: * Use the "diff --raw" output format as the input, but with a bit of twist. (1) a narrow special case that takes only a single diff_filepair of <old> and <new> blobs, and immediately run diff_queue() on that single diff_filepair, which is Justin's use case. For this mode of operation, "flush after reach record of input" may be sufficient. (2) as a general "interchange format" to feed "comparison between two sets of <object, path>" into our diff machinery, we are better off if we can treat the input stream as multiple records that describes comparison between two sets. Imagine "git log --oneline --first-parent -2 --raw HEAD", where one set of "diff --raw" records show the changed blobs with their paths between HEAD~1 and HEAD, and another set does so for HEAD~2 and HEAD~1. We need to be able to tell where the first set ends and the second set starts, so that rename detection and other things, if requested, can be done within each set. My recommendation is to use a single blank line as a separator, e.g. :100644 100644 ce31f93061 9829984b0a M Documentation/git-refs.txt :100644 100644 8b3882cff1 4a74f7c7bd M refs.c :100755 100755 1bfff3a7af f59bc4860f M t/t1460-refs-migrate.sh :100644 100644 c11213f520 8953d1c6d3 M refs/files-backend.c :100644 100644 b2e3ba877d bec5962deb M refs/reftable-backend.c so an application that wants to compare only one diff_filepair at a time would issue something like :100644 100644 ce31f93061 9829984b0a M Documentation/git-refs.txt :100644 100644 8b3882cff1 4a74f7c7bd M refs.c :100755 100755 1bfff3a7af f59bc4860f M t/t1460-refs-migrate.sh so the parsing machinery does not have to worry about case (1) above. * Parse and append the input into diff_queue(), until you see an blank line. - If at EOF you are done, but if you have something accumulated in diff_queue(), show them (like below) first. In any case, at EOF, you are done. * Run diffcore_std() followed by diff_flush() to have the contents of the queue nicely formatted and emptied. Go back to parsing more input lines.