Re: Pretty output in JSON format

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> · Thu, 26 Sep 2024 21:04:08 +0000

On 2024-09-25 at 18:45:54, Sean Allred wrote:
> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
> 
> > On 2024-09-24 at 21:52:35, Ron Ziroby Romero wrote:
> >> What do y'all think?
> >
> > I think this is ultimately a bad idea.  JSON requires that the output be
> > UTF-8, but Git processes a large amount of data, including file names,
> > ref names, commit messages, author and committer identities, diff
> > output, and other file contents, that are not restricted to UTF-8.
> 
> This strikes me with a little bit of 'perfect as the enemy of good'
> here. I'm sure there are ways to signal an encoding failure. I would,
> however, caution against trying to provide diff output in JSON. That
> just seems... odd. Maybe base64 it first? (I don't know -- I just
> struggle to see the use-case here.)

I understand JSON output would be useful, but it's also not useful to
randomly fail to do git for-each-ref (for example) because someone has a
non-UTF-8 ref, or to fail to do a git log because of encoding problems
(which absolutely is a problem in the Linux kernel tree).  "It works
most of the time, but seemingly randomly fails" is not a good user
experience, and I'm opposed to adding serialization formats that do
that.  (For that reason, just-send-bytes that produces invalid JSON on
occasion is also unacceptable.)

If we always base64-encoded or percent-encoded the things that aren't
guaranteed to be UTF-8, then we could well create JSON.  However, that
makes working with the data structure in most scripting languages a pain
since there's no automatic decoding of this data.  In strongly typed
languages like Rust, it's possible to do this decoding with no problem,
but I expect that's not most users who'd want this feature.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
Attachment:
signature.asc

Description: PGP signature