Re: Pretty output in JSON format

Ron Ziroby Romero <ziroby@xxxxxxxxx> · Fri, 27 Sep 2024 07:49:51 +0100

On Thu, 26 Sept 2024 at 22:04, brian m. carlson
<sandals@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On 2024-09-25 at 18:45:54, Sean Allred wrote:
> > "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
> >
> > > On 2024-09-24 at 21:52:35, Ron Ziroby Romero wrote:
> > >> What do y'all think?
> > >
> > > I think this is ultimately a bad idea.  JSON requires that the output be
> > > UTF-8, but Git processes a large amount of data, including file names,
> > > ref names, commit messages, author and committer identities, diff
> > > output, and other file contents, that are not restricted to UTF-8.
> >
> > This strikes me with a little bit of 'perfect as the enemy of good'
> > here. I'm sure there are ways to signal an encoding failure. I would,
> > however, caution against trying to provide diff output in JSON. That
> > just seems... odd. Maybe base64 it first? (I don't know -- I just
> > struggle to see the use-case here.)
>
> I understand JSON output would be useful, but it's also not useful to
> randomly fail to do git for-each-ref (for example) because someone has a
> non-UTF-8 ref, or to fail to do a git log because of encoding problems
> (which absolutely is a problem in the Linux kernel tree).  "It works
> most of the time, but seemingly randomly fails" is not a good user
> experience, and I'm opposed to adding serialization formats that do
> that.  (For that reason, just-send-bytes that produces invalid JSON on
> occasion is also unacceptable.)
>
> If we always base64-encoded or percent-encoded the things that aren't
> guaranteed to be UTF-8, then we could well create JSON.  However, that
> makes working with the data structure in most scripting languages a pain
> since there's no automatic decoding of this data.  In strongly typed
> languages like Rust, it's possible to do this decoding with no problem,
> but I expect that's not most users who'd want this feature.

I do plan on percent-encoding all non-UTF-8 data.  It sounds like a
good way to check this feature would be to call "git log
--pretty:json" on the Linux kernel and ensure we get a valid, though
massive, UTF-8 JSON file. (Not as an automated test, but as a way to
check that we've covered everything. Any stumbling blocks should be
put into an automated test.) The use case I'm thinking of is piping
data to jq to process it.

CBOR output seems useful, but I see it as a follow-up project. JSON
output would be more beneficial to more people, so I feel we should
tackle it first.

> >> What do y'all think?
> As with all things, I'd suggest you draw up a more formal proposal of
> exactly how this would work, and then that proposal can be discussed.
> How would you use this option? What would its behavior be? What's in
> scope? What's _not_ in scope? :-)

OK, I'll start working on a more formal proposal.

--
Ron Ziroby Romero