On Thu, Dec 12, 2024 at 10:23:09PM -0600, Justin Tobler wrote: > To enable support for batch diffs of multiple blob pairs, this > series introduces a new diff plumbing command git-diff-blob(1). Similar > to git-diff-tree(1), it provides a "--stdin" option that reads a pair of > blobs on each line of input and generates the diffs. This is intended to > be used for scripting purposes where more fine-grained control for diff > generation is desired. Below is an example for each usage: > > $ git diff-blob HEAD~5000:README.md HEAD:README.md > > $ git diff-blob --stdin <<EOF > 88f126184c52bfe4859ec189d018872902e02a84 665ce5f5a83647619fba9157fa9b0141ae8b228b > HEAD~5000:README.md HEAD:README.md > EOF In the first example, I think just using "git diff" would work (though it is not a plumbing command). But the stdin example is what's interesting here anyway, since it can handle arbitrary inputs. So let's focus on that. Feeding just blob ids has a big drawback: we don't have any context! So you get bogus filenames in the patch, no mode data, and so on. Feeding the paths along with their commits, as you do on the second line, gives you those things from the lookup context. But it also has some problems. One, it's needlessly expensive; we have to traverse HEAD~5000, and then dig into its tree to find the blobs (which presumably you already did, since how else would you end up with those oids). And two, there are parsing ambiguities, since arbitrary revision names can contain spaces. E.g., are we looking for the file "README.md HEAD:README.md" in HEAD~5000? So ideally we'd have an input format that encapsulates that extra context data and provides some mechanism for quoting. And it turns out we do: the --raw diff format. If the program takes that format, then you can manually feed it two arbitrary blob oids if you have them (and put whatever you like for the mode/path context), like: git diff-blob --stdin <<\EOF :100644 100644 88f126184c52bfe4859ec189d018872902e02a84 665ce5f5a83647619fba9157fa9b0141ae8b228b M README.md EOF Or you can get the real context yourself (though it seems to me that this is a gap in what "cat-file --batch" should be able to do in a single process): git ls-tree HEAD~5000 README.md >out read mode_a blob oid_a path <out git ls-tree HEAD README.md >out read mode_b blob oid_b path <out printf ":$mode_a $mode_b $oid_a $oid_b M\tREADME.md" | git diff-blob --stdin But it also means you can use --raw output directly. So: git diff-tree --raw -r HEAD~5000 HEAD -- README.md | git diff-blob --stdin Now that command by itself doesn't look all that useful; you could have just asked for patches from diff-tree. But by splitting the two, you can filter the set of paths in between (for example, to omit some entries, or to batch a large diff into more manageable chunks for pagination, etc). The patch might look something like this: https://lore.kernel.org/git/20161201204042.6yslbyrg7l6ghhww@xxxxxxxxxxxxxxxxxxxxx/ :) That is what has been powering the diffs at github.com since 2016 or so. And continues to do so, as far as I know. I don't have access to their internal repository anymore, but I've continued to rebase the topic forward in my personal repo. You can fetch it from: https://github.com/peff/git jk/diff-pairs in case that is helpful. -Peff