Re: [PATCH v2] [RFC] git-p4: improve encoding handling to support inconsistent encodings

Tao Klerks <tao@xxxxxxxxxx> · Tue, 19 Apr 2022 22:30:10 +0200

On Sun, Apr 17, 2022 at 8:17 PM Andrew Oakley <andrew@xxxxxxxxxxxxx> wrote:
>
>
> The way I look at it is that you both read and write bytes, and you may
> attempt to decode and re-encode text on the way.  Both the decoding and
> the encoding are done in metadata_stream_to_writable_bytes, so nothing
> else needs to know about the raw option being different.
>

Right - personally I just believe making the distinction explicit as
"strategies" makes for a less magical explanation than a special
encoding value that's not just a different encoding but also a
different behavior.

In other aspects, the behavior you're proposing (except for the final
fallback-decoding-failure) seems to be equivalent to what I've
implemented in the latest version.

>
> > I understand and share the data loss concern.
> >
> > As I just answered Ævar, I *think* I'd like to address the data loss
> > concern by escaping all x80+ bytes if something cannot be interpreted
> > even using the fallback encoding. In a commit message there could also
> > be a suffix explaining what happened, although I suspect that's
> > pointless complexity. The advantage of this approach is that it makes
> > it *possible* to reconstruct the original bytestream precisely, but
> > without creating badly-encoded git commit messages that need to be
> > skirted around.
>
> I think this gets pretty messy though.  In my opinion it's not any nicer
> than putting the raw bytes in the commit message.
>
> Git does not make any attempt enforce the commit metadata encoding, so I
> think that tools really should make an attempt to handle invalid data in
> a somewhat sensible fashion.
>
> I don't think there is really a "right" answer, anything reasonable
> would be better than what we've got now.

Alright - I went ahead with the "escape if you can't do it right"
behavior anyway, because it makes me feel better about being able to
say "no information loss" :)