Re: [PATCH] ref-filter: sort numerically when ":size" is used

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 01 Sep 2023 13:04:08 -0700

Kousik Sanagavarapu <five231003@xxxxxxxxx> writes:

> What I also find weird is the fact that we assign a "cmp_type" to the
> whole atom. Like "contents" is FIELD_STR and "objectsize" is "FIELD_ULONG"
> in "valid_atom". This seems wrong because the options of the atoms should be
> the ones deciding the "cmp_type", no?

I do not quite get where your confusion comes from.

The use of valid_atom[] purely for catalogging things like
"contents", "refname", etc., before specialization, as opposed to
used_atom[] that lists the actual specialized form of the atoms used
in the format string.  If you refer to "contents:body" and
"contents:size" in your format string, they become two entries in
used_atom[], both of which refer to the same atom_type obtained by
consulting the same entry in the valid_atom[] array.

The specialization between "contents:body" and "contents:size" must
be captured somewhere, and that happens by using two used_atom[]
entries.  There will be one "struct atom_value" for each of these
placeholders, each of which refers to its own used_atom that knows
for which variant of "contents" it was created.  Of course, these
two "struct atom_value" instances will have different content string
for the same ref (one stores the body part of the string, the other
stores the size of the contents).

> I wanted to leave the "cmp_type" field of the atom untouched because that
> would mess up this "global" setting of "contents" to be a "FIELD_STR" (or
> even "raw" for that matter).

We are not talking about futzing with valid_atom[] array.  

Because the used_atom[] array is designed to be used to capture the
differences among "contents" vs "contents:body" vs "contents:size",
what types of entities the values that uses an entry in used_atom[]
array (i.e. an instance of "struct atom_value") should be decided
using the information stored there.

I agree that Peff's "the value for 'contents:size' we know is
numeric, so only store the numeric value in atom_value and let the
output logic handle that using cmp_type information" sound very
tempting.  If we were to tackle it, however, I think it should be a
separate topic.

In any case, it was very good that you noticed we do not sort
numerically when sorting by size (I guess our sort by timestamp
weren't affected only because we have been lucky?).  Thanks for
starting this topic.