On Sun, Apr 17, 2022 at 8:17 PM Andrew Oakley <andrew@xxxxxxxxxxxxx> wrote: > > > The way I look at it is that you both read and write bytes, and you may > attempt to decode and re-encode text on the way. Both the decoding and > the encoding are done in metadata_stream_to_writable_bytes, so nothing > else needs to know about the raw option being different. > Right - personally I just believe making the distinction explicit as "strategies" makes for a less magical explanation than a special encoding value that's not just a different encoding but also a different behavior. In other aspects, the behavior you're proposing (except for the final fallback-decoding-failure) seems to be equivalent to what I've implemented in the latest version. > > > I understand and share the data loss concern. > > > > As I just answered Ævar, I *think* I'd like to address the data loss > > concern by escaping all x80+ bytes if something cannot be interpreted > > even using the fallback encoding. In a commit message there could also > > be a suffix explaining what happened, although I suspect that's > > pointless complexity. The advantage of this approach is that it makes > > it *possible* to reconstruct the original bytestream precisely, but > > without creating badly-encoded git commit messages that need to be > > skirted around. > > I think this gets pretty messy though. In my opinion it's not any nicer > than putting the raw bytes in the commit message. > > Git does not make any attempt enforce the commit metadata encoding, so I > think that tools really should make an attempt to handle invalid data in > a somewhat sensible fashion. > > I don't think there is really a "right" answer, anything reasonable > would be better than what we've got now. Alright - I went ahead with the "escape if you can't do it right" behavior anyway, because it makes me feel better about being able to say "no information loss" :)