Re: [PATCH v4 05/12] ref-filter: introduce parsing functions for each valid atom

Eric Sunshine <sunshine@xxxxxxxxxxxxxx> · Wed, 3 Feb 2016 17:19:41 -0500

On Sun, Jan 31, 2016 at 12:42 PM, Karthik Nayak <karthik.188@xxxxxxxxx> wrote:
> Parsing atoms is done in populate_value(), this is repetitive and
> hence expensive. Introduce a parsing function which would let us parse
> atoms beforehand and store the required details into the 'used_atom'
> structure for further usage.
>
> Helped-by: Eric Sunshine <sunshine@xxxxxxxxxxxxxx>
> Signed-off-by: Karthik Nayak <Karthik.188@xxxxxxxxx>
> ---
> diff --git a/ref-filter.c b/ref-filter.c
> @@ -36,6 +36,7 @@ static int need_color_reset_at_eol;
>  static struct {
>         const char *name;
>         cmp_type cmp_type;
> +       void (*parser)(struct used_atom *atom, const char *arg);

It's a little bit weird to pass in 'arg' as an additional argument
considering that the parser already has access to the same information
via the atom's 'name' field. I guess you're doing it as a convenience
so that parsers don't have to do the strchr(':') or memchr(':')
themselves (and because parse_ref_filter_atom() already has the
information at hand), right? A typical parser interested in a
(possibly optional) argument currently looks like this:

    void parse_foo(struct used_atom *a, const char *arg) {
        if (!arg)
            /* default behavior: arg absent */
        else
            /* process arg */
    }

That doesn't change much if you drop the 'arg' argument:

    void parse_foo(struct used_atom *a) {
        const char *arg = strchr(a->name, ':')
        if (!arg)
            /* default behavior: arg absent */
        else
            /* process arg */
    }

It does mean a very minimal amount of duplicated code (the single
strchr() line per parser), but it also would remove a bit of the
complexity which this patch adds to parse_ref_filter_atom(). So, I
dunno...

>  } valid_atom[] = {
>         { "refname" },
>         { "objecttype" },
> @@ -138,10 +140,9 @@ int parse_ref_filter_atom(const char *atom, const char *ep)
>                  * shouldn't be used for checking against the valid_atom
>                  * table.
>                  */
> -               const char *formatp = strchr(sp, ':');
> -               if (!formatp || ep < formatp)
> -                       formatp = ep;
> -               if (len == formatp - sp && !memcmp(valid_atom[i].name, sp, len))
> +               arg = memchr(sp, ':', ep - sp);

Why this change from strchr() to memchr()? I understand that you're
taking advantage of the fact that you know the extent of the string
via 'sp' and 'ep', however, was the original strchr() doing extra
work? Even if this change is desirable, it seems somewhat unrelated to
the overall purpose of this patch, thus might deserves its own.

Aside from that, although the "expensive" strchr() / memchr() resides
within the loop, it will always return the same value since it doesn't
depend upon any condition local to the loop. This implies that it
ought to be hoisted out of the loop. (This problem is not new to this
patch; it's already present in the existing code.)

> +               if ((!arg || len == arg - sp) &&
> +                   !memcmp(valid_atom[i].name, sp, len))
>                         break;
>         }
>
> @@ -154,6 +155,10 @@ int parse_ref_filter_atom(const char *atom, const char *ep)
>         REALLOC_ARRAY(used_atom, used_atom_cnt);
>         used_atom[at].name = xmemdupz(atom, ep - atom);
>         used_atom[at].type = valid_atom[i].cmp_type;
> +       if (arg)
> +               arg = used_atom[at].name + (arg - atom) + 1;

This is a harder to understand than it ought to be because it's
difficult to tell at first glance that you don't actually care about
'arg' in relation to the original incoming string, but instead use it
only to compute an offset into the string which is ultimately stored
in the newly allocated used_atom[]. Re-using 'arg' for a different
purpose (in a manner of speaking) confuses the issue further.

The intention might be easier to follow if you made it clear that you
were interested in the *offset* of the argument in the string, rather
than a pointer into the incoming string which you ultimately don't
use. A variable named 'arg_offset' might go a long way toward
clarifying this intention.

> +       if (valid_atom[i].parser)
> +               valid_atom[i].parser(&used_atom[at], arg);
>         if (*atom == '*')
>                 need_tagged = 1;
>         if (!strcmp(used_atom[at].name, "symref"))
> --
> 2.7.0
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html