Re: [PATCH v4 05/12] ref-filter: introduce parsing functions for each valid atom

Karthik Nayak <karthik.188@xxxxxxxxx> · Sat, 6 Feb 2016 20:06:59 +0530

On Thu, Feb 4, 2016 at 3:49 AM, Eric Sunshine <sunshine@xxxxxxxxxxxxxx> wrote:
> On Sun, Jan 31, 2016 at 12:42 PM, Karthik Nayak <karthik.188@xxxxxxxxx> wrote:
>> Parsing atoms is done in populate_value(), this is repetitive and
>> hence expensive. Introduce a parsing function which would let us parse
>> atoms beforehand and store the required details into the 'used_atom'
>> structure for further usage.
>>
>> Helped-by: Eric Sunshine <sunshine@xxxxxxxxxxxxxx>
>> Signed-off-by: Karthik Nayak <Karthik.188@xxxxxxxxx>
>> ---
>> diff --git a/ref-filter.c b/ref-filter.c
>> @@ -36,6 +36,7 @@ static int need_color_reset_at_eol;
>>  static struct {
>>         const char *name;
>>         cmp_type cmp_type;
>> +       void (*parser)(struct used_atom *atom, const char *arg);
>
> It's a little bit weird to pass in 'arg' as an additional argument
> considering that the parser already has access to the same information
> via the atom's 'name' field. I guess you're doing it as a convenience
> so that parsers don't have to do the strchr(':') or memchr(':')
> themselves (and because parse_ref_filter_atom() already has the
> information at hand), right? A typical parser interested in a
> (possibly optional) argument currently looks like this:
>
>     void parse_foo(struct used_atom *a, const char *arg) {
>         if (!arg)
>             /* default behavior: arg absent */
>         else
>             /* process arg */
>     }
>
> That doesn't change much if you drop the 'arg' argument:
>
>     void parse_foo(struct used_atom *a) {
>         const char *arg = strchr(a->name, ':')
>         if (!arg)
>             /* default behavior: arg absent */
>         else
>             /* process arg */
>     }
>
> It does mean a very minimal amount of duplicated code (the single
> strchr() line per parser), but it also would remove a bit of the
> complexity which this patch adds to parse_ref_filter_atom(). So, I
> dunno...
>

This change is introduced as a result of he suggestion given from Junio[1].
Although we're adding a little complexity to parse_ref_filter_atom() I feel its
justified by the interface provided as you mentioned. This ensures that parser
functions don't need to implement their own methods for getting the arguments
and can rely on being provided the arguments to them.

1: http://article.gmane.org/gmane.comp.version-control.git/283499

>>  } valid_atom[] = {
>>         { "refname" },
>>         { "objecttype" },
>> @@ -138,10 +140,9 @@ int parse_ref_filter_atom(const char *atom, const char *ep)
>>                  * shouldn't be used for checking against the valid_atom
>>                  * table.
>>                  */
>> -               const char *formatp = strchr(sp, ':');
>> -               if (!formatp || ep < formatp)
>> -                       formatp = ep;
>> -               if (len == formatp - sp && !memcmp(valid_atom[i].name, sp, len))
>> +               arg = memchr(sp, ':', ep - sp);
>
> Why this change from strchr() to memchr()? I understand that you're
> taking advantage of the fact that you know the extent of the string
> via 'sp' and 'ep', however, was the original strchr() doing extra
> work? Even if this change is desirable, it seems somewhat unrelated to
> the overall purpose of this patch, thus might deserves its own.
>

I'm thinking I'll make a patch for that separately. i.e remove strchr()
and introduce memchr() outside the loop.

> Aside from that, although the "expensive" strchr() / memchr() resides
> within the loop, it will always return the same value since it doesn't
> depend upon any condition local to the loop. This implies that it
> ought to be hoisted out of the loop. (This problem is not new to this
> patch; it's already present in the existing code.)
>

Yes, makes sense.

>> +               if ((!arg || len == arg - sp) &&
>> +                   !memcmp(valid_atom[i].name, sp, len))
>>                         break;
>>         }
>>
>> @@ -154,6 +155,10 @@ int parse_ref_filter_atom(const char *atom, const char *ep)
>>         REALLOC_ARRAY(used_atom, used_atom_cnt);
>>         used_atom[at].name = xmemdupz(atom, ep - atom);
>>         used_atom[at].type = valid_atom[i].cmp_type;
>> +       if (arg)
>> +               arg = used_atom[at].name + (arg - atom) + 1;
>
> This is a harder to understand than it ought to be because it's
> difficult to tell at first glance that you don't actually care about
> 'arg' in relation to the original incoming string, but instead use it
> only to compute an offset into the string which is ultimately stored
> in the newly allocated used_atom[]. Re-using 'arg' for a different
> purpose (in a manner of speaking) confuses the issue further.
>
> The intention might be easier to follow if you made it clear that you
> were interested in the *offset* of the argument in the string, rather
> than a pointer into the incoming string which you ultimately don't
> use. A variable named 'arg_offset' might go a long way toward
> clarifying this intention.
>

I hope you mean something like this.

if (arg) {
    int arg_offset;

    arg_offset = (arg - atom) + 1;
    arg = used_atom[at].name + arg_offset;
}

-- 
Regards,
Karthik Nayak
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html