Re: [PATCH 2/2] [GSOC][RFC] ref-filter: introduce enum atom_type

Christian Couder <christian.couder@xxxxxxxxx> · Sun, 9 May 2021 08:21:41 +0200

On Sat, May 8, 2021 at 5:22 PM ZheNing Hu via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> From: ZheNing Hu <adlternative@xxxxxxxxx>
>
> In the original ref-filter design, it will copy the parsed
> atom's name and attributes to `used_atom[i].name` in the
> atom's parsing step, and use it again for string matching
> in the later specific ref attributes filling step. It use
> a lot of string matching to determine which atom we need.
>
> Introduce the enum member `valid_atom.atom_type` which
> record type of each valid_atom, in the first step of the
> atom parsing, `used_atom.atom_type` will record corresponding
> enum value from `valid_atom.atom_type`, and then in specific
> reference attribute filling step, only need to compare the
> value of the `used_atom.atom_type` to judge the atom type.
>
> At the same time, The value of an atom_type is the coordinate
> of its corresponding valid_atom entry, we can quickly index
> to the corresponding valid_atom entry by the atom_type value.

I am not sure it's worth having an atom_type field for each valid_atom
element if the value of that field is already the index of the
element, because then one would always be able to replace
`valid_atom[i].atom_type` with just `i`. Or is it for some kind of
type safety issue?

> Signed-off-by: ZheNing Hu <adlternative@xxxxxxxxx>
> ---
>  ref-filter.c | 192 +++++++++++++++++++++++++++++++++------------------
>  1 file changed, 123 insertions(+), 69 deletions(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index f420bae6e5ba..4ce46e70dc99 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -108,6 +108,50 @@ static struct ref_to_worktree_map {
>         struct worktree **worktrees;
>  } ref_to_worktree_map;
>
> +enum atom_type {
> +ATOM_REFNAME,
...
+ATOM_ELSE,
+};

I wonder if the enum should be instead defined like this:

enum atom_type {
ATOM_UNKNOWN = 0,
ATOM_REFNAME,
...
ATOM_ELSE,
ATOM_INVALID, /* should be last */
};

As a struct containing an atom_type would typically be initialized
with 0 after being allocated, `ATOM_UNKNOWN = 0` could ensure that we
can easily distinguish such a struct where the atom_type is known from
such a struct where it is unknown yet.

Having ATOM_INVALID as always the last enum value (even if some new
ones are added later) could help us iterate over the valid atoms using
something like:

for (i = ATOM_UNKNOWN + 1; i < ATOM_INVALID; i++)
        /* do something with valid_atom[i] */;

> +
>  /*
>   * An atom is a valid field atom listed below, possibly prefixed with
>   * a "*" to denote deref_tag().
> @@ -122,6 +166,7 @@ static struct used_atom {
>         const char *name;
>         cmp_type type;
>         info_source source;
> +       enum atom_type atom_type;
>         union {
>                 char color[COLOR_MAXLEN];
>                 struct align align;
> @@ -500,53 +545,54 @@ static int head_atom_parser(const struct ref_format *format, struct used_atom *a
>  }
>
>  static struct {
> +       enum atom_type atom_type;
>         const char *name;
>         info_source source;
>         cmp_type cmp_type;

I can see that the fields are already not in the same order as in
struct used_atom, but my opinion is that it would be better if they
would we as much as possible in the same order. Maybe one day we could
even unify these structs in some way.

Also as discussed above we might not even need to add an atom_type to
valid_atom[].

>         int (*parser)(const struct ref_format *format, struct used_atom *atom,
>                       const char *arg, struct strbuf *err);
>  } valid_atom[] = {

> @@ -628,6 +674,7 @@ static int parse_ref_filter_atom(const struct ref_format *format,
>         at = used_atom_cnt;
>         used_atom_cnt++;
>         REALLOC_ARRAY(used_atom, used_atom_cnt);
> +       used_atom[at].atom_type = valid_atom[i].atom_type;

As discussed above, if the value of an atom_type is the coordinate of
its corresponding valid_atom entry, then here the following would be
simpler:

       used_atom[at].atom_type = i;

>         used_atom[at].name = xmemdupz(atom, ep - atom);
>         used_atom[at].type = valid_atom[i].cmp_type;
>         used_atom[at].source = valid_atom[i].source;
> @@ -652,7 +699,7 @@ static int parse_ref_filter_atom(const struct ref_format *format,
>                 return -1;
>         if (*atom == '*')
>                 need_tagged = 1;
> -       if (!strcmp(valid_atom[i].name, "symref"))
> +       if (valid_atom[i].atom_type == ATOM_SYMREF)

In the same way as above, the above line might be replaced with the simpler:

       if (i == ATOM_SYMREF)

>                 need_symref = 1;
>         return at;
>  }