Kousik Sanagavarapu <five231003@xxxxxxxxx> writes: > Atoms like "raw" and "contents" have a ":size" option which can be used > to know the size of the data. Since these atoms have the cmp_type > FIELD_STR, they are sorted alphabetically from 'a' to 'z' and '0' to > '9'. Meaning, even when the ":size" option is used and what we > ultimatlely have is numbers, we still sort alphabetically. There are other cmp_types already defined, like ULONG and TIME. How are they used and affect the comparison? Naively, :size sounds like a good candidate to compare as ULONG, as it cannot be negative even though 0 is a valid size. I understand and agree with the motivation, but the implementation looks puzzling. > diff --git a/ref-filter.c b/ref-filter.c > index 1bfaf20fbf..5d7bea5f23 100644 > --- a/ref-filter.c > +++ b/ref-filter.c > @@ -932,7 +932,13 @@ struct atom_value { > ssize_t s_size; > int (*handler)(struct atom_value *atomv, struct ref_formatting_state *state, > struct strbuf *err); > - uintmax_t value; /* used for sorting when not FIELD_STR */ > + > + /* > + * Used for sorting when not FIELD_STR or when FIELD_STR but the > + * sort should be numeric and not alphabetic. > + */ > + uintmax_t value; This does not explain why we cannot make <anything>:size FIELD_ULONG for <anything> that is of FIELD_STR type, though. IOW, why such a strange "when FIELD_STR but the sort should be numeric" is needed? If you have a <size>, shouldn't it always be numeric? IOW, when you notice that you need to set, say, u.contents.option of an atom to C_LENGTH, shouldn't you set cmp_type of the atom to FIELD_ULONG, somewhere in contents_atom_parser() and friends, and everything should naturally follow, no? It seems that support for other cmp_types are incomplete in the current code. There are FIELD_ULONG and FIELD_TIME defined, but they do not appear to be used in any way, so the cmp_ref_sorting() would need to be updated to make it actually pay attention to the cmp_type and perform numeric comparison. > @@ -1883,8 +1890,10 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp > v->s = strbuf_detach(&sb, NULL); > } else if (atom->u.contents.option == C_BODY_DEP) > v->s = xmemdupz(bodypos, bodylen); > - else if (atom->u.contents.option == C_LENGTH) > - v->s = xstrfmt("%"PRIuMAX, (uintmax_t)strlen(subpos)); > + else if (atom->u.contents.option == C_LENGTH) { > + v->value = (uintmax_t)strlen(subpos); > + v->s = xstrfmt("%"PRIuMAX, v->value); > + } We should take a note that all of these v->value are *per* *item* that will be sorted. > @@ -2986,7 +2996,7 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru > cmp_detached_head = 1; > } else if (s->sort_flags & REF_SORTING_VERSION) { > cmp = versioncmp(va->s, vb->s); > - } else if (cmp_type == FIELD_STR) { > + } else if (cmp_type == FIELD_STR && !va->value && !vb->value) { Two refs may point at an empty object with zero length, i.e. for them, !v->value is true, and another ref may point at a non-empty object. The two empty refs are compared with an algorithm different from the algorithm used to compare the empty ref and the non-empty ref. Isn't this broken as a comparison function to be given to QSORT(), which must be transitive (e.g. if A < B and B < C, then it should be guaranteed that A < C and you do not have to compare A and C)? IOW, the choice of the comparison algorithm should not depend on an attribute (like value or s) that is specific to the item being compared. Things like cmp_type that is defined at the used_atom lefvel to make the sorting function stable, I would think.