On September 13, 2018 1:52 PM, Junio C Hamano wrote: > Junio C Hamano <gitster@xxxxxxxxx> writes: > > > "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> writes: > > > >> The scenario is slightly different. > >> 1. Person A gives me a new binary file-1 with fingerprint A1. This > >> goes into git unchanged. > >> 2. Person B gives me binary file-2 with fingerprint B2. This does not > >> go into git yet. > >> 3. We attempt a git diff between the committed file-1 and uncommitted > >> file-2 using a textconv implementation that strips what we don't need to > compare. > >> 4. If file-1 and file-2 have no difference when textconv is used, > >> file-2 is not added and not committed. It is discarded with impunity, > >> never to be seen again, although we might whine a lot at the user for > >> attempting to put > >> file-2 in - but that's not git's issue. > > > > You are forgetting that Git is a distributed version control system, > > aren't you? Person A and B can introduce their "moral equivalent but > > bytewise different" copies to their repository under the same object > > name, and you can pull from them--what happens? > > > > It is fundamental that one object name given to Git identifies one > > specific byte sequence contained in an object uniquely. Once you > > broke that, you no longer have Git. > > Having said all that, if you want to keep the original with frills but somehow > give these bytewise different things that reduce to the same essence (e.g. > when passed thru a filter like textconv), I suspect a better approach might be > to store both the "original" and the result of passing the "original" through > the filter in the object database. In the above example, you'll get two > "original" > objects from person A and person B, plus one "canonical" object that are > bytewise different from either of these two originals, but what they reduce > to when you use the filter on them. Then you record the fact that to derive > the "essence" object, you can reduce either person A's or person B's > "original" through the filter, perhaps by using "git notes" attached to the > "essence" object, recording the object names of these originals (the reason > why using notes in this direction is because you can mechanically determine > which "essence" > object any given "original" object reduces to---it is just the matter of passing > it through the filter. But there can be more than one "original" that reduces > to the same "essence"). I like that idea. It turns the reduced object into a contract. Thanks.