Re: [PATCH v2] routines to generate JSON data

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 22 Mar 2018 09:38:34 +0100

On Wed, Mar 21 2018, git@xxxxxxxxxxxxxxxxx wrote:

> So, I'm not sure we have a route to get UTF-8-clean data out of Git, and if
> we do it is beyond the scope of this patch series.
>
> So I think for our uses here, defining this as "JSON-like" is probably the
> best answer.  We write the strings as we received them (from the file system,
> the index, or whatever).  These strings are properly escaped WRT double
> quotes, backslashes, and control characters, so we shouldn't have an issue
> with decoders getting out of sync -- only with them rejecting non-UTF-8
> sequences.
>
> We could blindly \uXXXX encode each of the hi-bit characters, if that would
> help the parsers, but I don't want to do that right now.
>
> WRT binary data, I had not intended using this for binary data.  And without
> knowing what kinds or quantity of binary data we might use it for, I'd like
> to ignore this for now.

I agree we should just ignore this problem for now given the immediate
use-case.