Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > Sure, but my intention was to synchronize the `--raw` vs the `--patch` > output: the latter _already_ shows the correct hash. This here patch makes > the hash in the former's output match the latter's. That is shooting for a wrong uniformity and breaking consistency among the `--raw` modes. $ git reset --hard $ echo "/* end */" >cache.h ;# taint $ git diff-files --raw ... this shows (M)odified with 0{40} on the postimage ... 0{40} for side that is known to have contents from low-level diff ... means "object name unknown; figure it out yourself if you need it" $ git update-index cache.h $ git diff-files --raw ... of course we see nothing here. Wait for a bit. $ touch cache.h ;# smudge $ git diff-files --raw ... this shows (M)odified with 0{40} on the postimage ... again, it says "it is stat dirty so I do not bother to compute" $ git update-index --refresh $ git diff-files --raw ... again we see nothing. Any tools that work on "--raw" output must be already prepared to see 0{40} on the side that is known to have contents and must know to grab the contents from the working tree file if they need them, so showing the 0{40} for i-t-a entry (whose definition is "the user said in the past that the final contents of the file will be added later, but Git does not know what object it will be yet") cannot break them. And the behaviour of giving 0{40} in such a case aligns well with what is already done for paths already added to the index when Git does not have an already-computed object name handy. > Besides, we're talking about the post-image of `diff-files`, i.e. the > worktree version, here. I seem to remember that the pre-image already uses > the all-zero hash to indicate precisely what you mentioned above. The 0{40} you see for pre-image for (A)dded paths means a completely different thing from the 0{40} I have been explaining in the above, so that is not relevant here. By definition, there is *no* contents for the pre-image side of (A)dded paths (that is why I stressed the "side that must have contents" in the above description---it is determined by the type of the change), but because the format requires us to place some hexadecimal there, we fill the space with 0{40}. When we do not know the object name for the side that is known to have contents without performing extra computation (including "stat dirty so we cannot tell without rehashing"), we also use 0{40} as a signal to say "we do not know the actual contents", but the consumer of "--raw" format is expected to know the difference between "this side is known to have no data and 0{40} is here as filler" and "this side must have contents but we see 0{40} because Git does not have it handy in precomputed form". The above is the same for "diff-index --raw" without "--cached"; when we have to hash before we can give the object name (e.g. the path is stat-dirty), we give 0{40} and let the consumer figure it out if it needs to. $ git reset --hard $ touch COPYING $ git diff-index --raw HEAD ... we see (M)odified with 0{40} on the right hand side. When the caller asks for "--patch" or any other output format that actually needs contents for output, however, these low-level tools do read the contents, and as a side effect, they may hash to obtain the object name and show it [*1*]. By the way, as I do not want to see you waste your time going in a wrong direction just to be "different", let me make it clear that as far as the design of low level diff plumbing is concerned, what I said here is final. Please don't waste your time on arguing for changing the design now after 15 years. I want to see your time used in a more productive way for the project. Thanks. [Footnote] *1* This division of labor to free "--raw" mode of anything remotely unnecessary stems from the original diff plumbing design in May 2005 where the "--raw" mode was the only output mode, and there was a separate "git-diff-helper" (look for it in the mailing list archive if you want to learn more) that reads a "--raw" output and transforms it into the patch form. That "once we have the raw diff, we can pipe it to post-processing and do more interesting things" eventually led to the design of the diffcore pipeline where we match up (A)dded and (D)eleted entries to detect renames, etc.