Hi Junio, On Wed, 24 Jun 2020, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > > > Sure, but my intention was to synchronize the `--raw` vs the `--patch` > > output: the latter _already_ shows the correct hash. This here patch makes > > the hash in the former's output match the latter's. > > That is shooting for a wrong uniformity and breaking consistency > among the `--raw` modes. > > $ git reset --hard > $ echo "/* end */" >cache.h ;# taint > $ git diff-files --raw > ... this shows (M)odified with 0{40} on the postimage > ... 0{40} for side that is known to have contents from low-level diff > ... means "object name unknown; figure it out yourself if you need it" > $ git update-index cache.h > $ git diff-files --raw > ... of course we see nothing here. Wait for a bit. > $ touch cache.h ;# smudge > $ git diff-files --raw > ... this shows (M)odified with 0{40} on the postimage > ... again, it says "it is stat dirty so I do not bother to compute" > $ git update-index --refresh > $ git diff-files --raw > ... again we see nothing. > > Any tools that work on "--raw" output must be already prepared to > see 0{40} on the side that is known to have contents and must know > to grab the contents from the working tree file if they need them, > so showing the 0{40} for i-t-a entry (whose definition is "the user > said in the past that the final contents of the file will be added > later, but Git does not know what object it will be yet") cannot > break them. And the behaviour of giving 0{40} in such a case aligns > well with what is already done for paths already added to the index > when Git does not have an already-computed object name handy. Well, don't you know, I never realized that the hash shown by `git diff-files --raw` for modified files was all-zero while `git diff-files -p` showed the computed one matching the current worktree version! > > Besides, we're talking about the post-image of `diff-files`, i.e. the > > worktree version, here. I seem to remember that the pre-image already uses > > the all-zero hash to indicate precisely what you mentioned above. > > The 0{40} you see for pre-image for (A)dded paths means a completely > different thing from the 0{40} I have been explaining in the above, > so that is not relevant here. > > By definition, there is *no* contents for the pre-image side of > (A)dded paths (that is why I stressed the "side that must have > contents" in the above description---it is determined by the type of > the change), but because the format requires us to place some > hexadecimal there, we fill the space with 0{40}. > > When we do not know the object name for the side that is known to > have contents without performing extra computation (including "stat > dirty so we cannot tell without rehashing"), we also use 0{40} as a > signal to say "we do not know the actual contents", but the consumer > of "--raw" format is expected to know the difference between "this > side is known to have no data and 0{40} is here as filler" and "this > side must have contents but we see 0{40} because Git does not have > it handy in precomputed form". > > The above is the same for "diff-index --raw" without "--cached"; > when we have to hash before we can give the object name (e.g. the > path is stat-dirty), we give 0{40} and let the consumer figure it > out if it needs to. > > $ git reset --hard > $ touch COPYING > $ git diff-index --raw HEAD > ... we see (M)odified with 0{40} on the right hand side. > > When the caller asks for "--patch" or any other output format that > actually needs contents for output, however, these low-level tools > do read the contents, and as a side effect, they may hash to obtain > the object name and show it [*1*]. > > By the way, as I do not want to see you waste your time going in a > wrong direction just to be "different", let me make it clear that as > far as the design of low level diff plumbing is concerned, what I > said here is final. Please don't waste your time on arguing for > changing the design now after 15 years. I want to see your time > used in a more productive way for the project. Thank you for patienty explaining to me something I managed to miss for a decade and a half. I'll send out v4 in a moment. Ciao, Dscho > > Thanks. > > > [Footnote] > > *1* This division of labor to free "--raw" mode of anything remotely > unnecessary stems from the original diff plumbing design in May > 2005 where the "--raw" mode was the only output mode, and there > was a separate "git-diff-helper" (look for it in the mailing > list archive if you want to learn more) that reads a "--raw" > output and transforms it into the patch form. That "once we > have the raw diff, we can pipe it to post-processing and do more > interesting things" eventually led to the design of the diffcore > pipeline where we match up (A)dded and (D)eleted entries to > detect renames, etc. > >