On September 12, 2018 7:00 PM, Junio C Hamano wrote: > "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> writes: > > >> author is important to our process. My objective is to keep the > >> original file 100% exact as supplied and then ignore any changes to > >> the metadata that I don't care about (like Creator) if the remainder of the > file is the same. > > That will *not* work. If person A gave you a version of original, which > hashes to X after you strip the cruft you do not care about, you would > register that original with person A's fingerprint on under the name of X. > What happens when person B gives you another version, which is not byte- > for-byte identical to the one you got earlier from person A, but does hash to > the same X after you strip the cruft? If you are going to store it in Git, and if > by SHA-1 you are calling what we perceive as "object name" in Git land, you > must store that one with person B's fingerprint on it also under the name of > X. Now which version will you get from Git when you ask it to give you the > object that hashes to X? The scenario is slightly different. 1. Person A gives me a new binary file-1 with fingerprint A1. This goes into git unchanged. 2. Person B gives me binary file-2 with fingerprint B2. This does not go into git yet. 3. We attempt a git diff between the committed file-1 and uncommitted file-2 using a textconv implementation that strips what we don't need to compare. 4. If file-1 and file-2 have no difference when textconv is used, file-2 is not added and not committed. It is discarded with impunity, never to be seen again, although we might whine a lot at the user for attempting to put file-2 in - but that's not git's issue. 5. If file-1 and file-2 have differences when textconv is used, file-2 is committed with fingerprint B2. 6. Even if an error is made by the user and they commit file-2 with B2 regardless of textconv, there will be a human who complains about it, but git has two unambiguous fingerprints that happen to have no diffs after textconv is applied. My original hope was that textconv could be used to influence the fingerprint, but I do not think that is the case, so I went with an alternative. In the application, I am not allowed to strip any cruft off file-1 when it is stored - it must be byte-for-byte the original file. This application is marginally related to a DRM-like situation where we only care about the original image provided by a user, but any copies that are provided by another user with modified metadata will be disallowed from repository. Does that make more sense? Cheers, Randall