Re: [PATCH 2/2] [GSOC][RFC] ref-filter: introduce enum atom_type

ZheNing Hu <adlternative@xxxxxxxxx> · Sun, 9 May 2021 21:40:37 +0800

Christian Couder <christian.couder@xxxxxxxxx> 于2021年5月9日周日 下午2:21写道：
>
> > At the same time, The value of an atom_type is the coordinate
> > of its corresponding valid_atom entry, we can quickly index
> > to the corresponding valid_atom entry by the atom_type value.
>
> I am not sure it's worth having an atom_type field for each valid_atom
> element if the value of that field is already the index of the
> element, because then one would always be able to replace
> `valid_atom[i].atom_type` with just `i`. Or is it for some kind of
> type safety issue?
>

Well, I think the security issue here is just to allow used_atom and valid_atom
to be correctly mapped through atom_type. We don’t want the coder to forget to
update "enum atom_type" when adding new atoms to valid_atom in the future.
So maybe Junio's suggestion is reasonable, we delete the member atom_type in
valid_atom, but maintain the connection between atom_type and valid_atom items
by specifying atom_type as array coordinates.

> I wonder if the enum should be instead defined like this:
>
> enum atom_type {
> ATOM_UNKNOWN = 0,
> ATOM_REFNAME,
> ...
> ATOM_ELSE,
> ATOM_INVALID, /* should be last */
> };
>
> As a struct containing an atom_type would typically be initialized
> with 0 after being allocated, `ATOM_UNKNOWN = 0` could ensure that we
> can easily distinguish such a struct where the atom_type is known from
> such a struct where it is unknown yet.
>
> Having ATOM_INVALID as always the last enum value (even if some new
> ones are added later) could help us iterate over the valid atoms using
> something like:
>
> for (i = ATOM_UNKNOWN + 1; i < ATOM_INVALID; i++)
>         /* do something with valid_atom[i] */;
>

Thanks, this suggestion is good!

> > +
> >  /*
> >   * An atom is a valid field atom listed below, possibly prefixed with
> >   * a "*" to denote deref_tag().
> > @@ -122,6 +166,7 @@ static struct used_atom {
> >         const char *name;
> >         cmp_type type;
> >         info_source source;
> > +       enum atom_type atom_type;
> >         union {
> >                 char color[COLOR_MAXLEN];
> >                 struct align align;
> > @@ -500,53 +545,54 @@ static int head_atom_parser(const struct ref_format *format, struct used_atom *a
> >  }
> >
> >  static struct {
> > +       enum atom_type atom_type;
> >         const char *name;
> >         info_source source;
> >         cmp_type cmp_type;
>
> I can see that the fields are already not in the same order as in
> struct used_atom, but my opinion is that it would be better if they
> would we as much as possible in the same order. Maybe one day we could
> even unify these structs in some way.
>

Yes, atom_value, valid_atom, used_atom, It may be difficult to read for the
first time. Maybe unifying them is a good direction for the future.

> Also as discussed above we might not even need to add an atom_type to
> valid_atom[].
>

OK.

> >         int (*parser)(const struct ref_format *format, struct used_atom *atom,
> >                       const char *arg, struct strbuf *err);
> >  } valid_atom[] = {
>
> > @@ -628,6 +674,7 @@ static int parse_ref_filter_atom(const struct ref_format *format,
> >         at = used_atom_cnt;
> >         used_atom_cnt++;
> >         REALLOC_ARRAY(used_atom, used_atom_cnt);
> > +       used_atom[at].atom_type = valid_atom[i].atom_type;
>
> As discussed above, if the value of an atom_type is the coordinate of
> its corresponding valid_atom entry, then here the following would be
> simpler:
>
>        used_atom[at].atom_type = i;
>

I agree.

> >         used_atom[at].name = xmemdupz(atom, ep - atom);
> >         used_atom[at].type = valid_atom[i].cmp_type;
> >         used_atom[at].source = valid_atom[i].source;
> > @@ -652,7 +699,7 @@ static int parse_ref_filter_atom(const struct ref_format *format,
> >                 return -1;
> >         if (*atom == '*')
> >                 need_tagged = 1;
> > -       if (!strcmp(valid_atom[i].name, "symref"))
> > +       if (valid_atom[i].atom_type == ATOM_SYMREF)
>
> In the same way as above, the above line might be replaced with the simpler:
>
>        if (i == ATOM_SYMREF)
>
> >                 need_symref = 1;
> >         return at;
> >  }

Thanks!
--
ZheNing Hu