Junio C Hamano <gitster@xxxxxxxxx> writes: > "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> writes: > >> The scenario is slightly different. >> 1. Person A gives me a new binary file-1 with fingerprint A1. This goes into >> git unchanged. >> 2. Person B gives me binary file-2 with fingerprint B2. This does not go >> into git yet. >> 3. We attempt a git diff between the committed file-1 and uncommitted file-2 >> using a textconv implementation that strips what we don't need to compare. >> 4. If file-1 and file-2 have no difference when textconv is used, file-2 is >> not added and not committed. It is discarded with impunity, never to be seen >> again, although we might whine a lot at the user for attempting to put >> file-2 in - but that's not git's issue. > > You are forgetting that Git is a distributed version control system, > aren't you? Person A and B can introduce their "moral equivalent > but bytewise different" copies to their repository under the same > object name, and you can pull from them--what happens? > > It is fundamental that one object name given to Git identifies one > specific byte sequence contained in an object uniquely. Once you > broke that, you no longer have Git. Having said all that, if you want to keep the original with frills but somehow give these bytewise different things that reduce to the same essence (e.g. when passed thru a filter like textconv), I suspect a better approach might be to store both the "original" and the result of passing the "original" through the filter in the object database. In the above example, you'll get two "original" objects from person A and person B, plus one "canonical" object that are bytewise different from either of these two originals, but what they reduce to when you use the filter on them. Then you record the fact that to derive the "essence" object, you can reduce either person A's or person B's "original" through the filter, perhaps by using "git notes" attached to the "essence" object, recording the object names of these originals (the reason why using notes in this direction is because you can mechanically determine which "essence" object any given "original" object reduces to---it is just the matter of passing it through the filter. But there can be more than one "original" that reduces to the same "essence").