On Sat, Sep 02, 2023 at 12:29:07AM +0530, Kousik Sanagavarapu wrote: > > > > I think they are covered implicitly by the "else" block of the > > > > conditional that checks for FIELD_STR. > > > > > > Ah, OK. That needs to be future-proofed to force future developers > > > who want to add different FIELD_FOO type to look at the comparison > > > logic. If we want to do so, it should be done as a separate topic > > > for cleaning-up the mess, not as part of this effort. > > What I also find weird is the fact that we assign a "cmp_type" to the > whole atom. Like "contents" is FIELD_STR and "objectsize" is "FIELD_ULONG" > in "valid_atom". This seems wrong because the options of the atoms should be > the ones deciding the "cmp_type", no? I think the data structure is a little confusing if you haven't worked with it before. But basically each "atom" corresponds to a single "%()" block in the format. So if you ran: git for-each-ref --format="%(contents:size) %(contents:body)" you'd have two atoms in the used_atom struct: one for the size and one for the body. IMHO the code would be a lot easier to work with if the atoms were structured as a parse tree with child pointers (especially when you get into things like "if" that have sub-expressions). I think one of the reasons that used_atom is an array is to de-duplicate repeated mentions (so if you formatted "%(foo) %(foo)" it would only have to store the computed value once). But I think that is the wrong way to optimize it. We shouldn't be storing any strings per-atom, but rather walking the parse tree to produce a single output buffer. And the values should be cheap to fill in, because we should parse the object as necessary up front. This is more or less the way the pretty.c parser does it. But that is all quite a large tangent from what you're working on, and would probably be a ground-up rewrite of the formatting code. You can safely ignore my rant for the purposes of your patch. ;) > I wanted to leave the "cmp_type" field of the atom untouched because that > would mess up this "global" setting of "contents" to be a "FIELD_STR" (or > even "raw" for that matter). Although that seems like a bad idea, after > I've read Junio's and your comments. Yeah, I agree that would be a problem if there were one global "contents". But we are allocating a new atom struct on the fly for the contents:size directive that we parse, and so on. -Peff