Re: [PATCH] ref-filter: sort numerically when ":size" is used

Jeff King <peff@xxxxxxxx> · Fri, 1 Sep 2023 15:16:39 -0400

On Sat, Sep 02, 2023 at 12:29:07AM +0530, Kousik Sanagavarapu wrote:

> > > > I think they are covered implicitly by the "else" block of the
> > > > conditional that checks for FIELD_STR.
> > > 
> > > Ah, OK.  That needs to be future-proofed to force future developers
> > > who want to add different FIELD_FOO type to look at the comparison
> > > logic.  If we want to do so, it should be done as a separate topic
> > > for cleaning-up the mess, not as part of this effort.
> 
> What I also find weird is the fact that we assign a "cmp_type" to the
> whole atom. Like "contents" is FIELD_STR and "objectsize" is "FIELD_ULONG"
> in "valid_atom". This seems wrong because the options of the atoms should be
> the ones deciding the "cmp_type", no?

I think the data structure is a little confusing if you haven't worked
with it before. But basically each "atom" corresponds to a single "%()"
block in the format. So if you ran:

  git for-each-ref --format="%(contents:size) %(contents:body)"

you'd have two atoms in the used_atom struct: one for the size and one
for the body.

IMHO the code would be a lot easier to work with if the atoms were
structured as a parse tree with child pointers (especially when you get
into things like "if" that have sub-expressions). I think one of the
reasons that used_atom is an array is to de-duplicate repeated mentions
(so if you formatted "%(foo) %(foo)" it would only have to store the
computed value once).

But I think that is the wrong way to optimize it. We shouldn't be
storing any strings per-atom, but rather walking the parse tree to
produce a single output buffer. And the values should be cheap to fill
in, because we should parse the object as necessary up front. This is
more or less the way the pretty.c parser does it.

But that is all quite a large tangent from what you're working on, and
would probably be a ground-up rewrite of the formatting code. You can
safely ignore my rant for the purposes of your patch. ;)

> I wanted to leave the "cmp_type" field of the atom untouched because that
> would mess up this "global" setting of "contents" to be a "FIELD_STR" (or
> even "raw" for that matter). Although that seems like a bad idea, after
> I've read Junio's and your comments.

Yeah, I agree that would be a problem if there were one global
"contents". But we are allocating a new atom struct on the fly for the
contents:size directive that we parse, and so on.

-Peff