Re: [PATCH v3 00/15] ref-filter: use parsing functions

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 07 Jan 2016 10:43:41 -0800

Karthik Nayak <karthik.188@xxxxxxxxx> writes:

>> I don't understand the difficulty. It should be easy to manually skip
>> the 'deref' for this one particular case:
>>
>>     const char *name = atom->name;
>>     if (*name == '*')
>>         name++;
>>
>> Which would allow this unnecessarily complicated code from patch 14/15:
>>
>>     if (match_atom_name(atom->name, "subject", &buf) && !buf) {
>>         ...
>>         return;
>>     } else if (match_atom_name(atom->name, "body", &buf) && !buf) {
>>         ...
>>         return;
>>     } if (!match_atom_name(atom->name, "contents", &buf))
>>         die("BUG: parsing non-'contents'");
>>
>> to be simplified to the more easily understood form suggested during
>> review[1] of v2:
>>
>>     if (!strcmp(name, "subject")) {
>>         ...
>>         return;
>>     } else if (!strcmp(name, "body")) {
>>         ...
>>         return;
>>     } else if (!match_atom_name(name,"contents", &buf))
>>         die("BUG: expected 'contents' or 'contents:'");
>>
>> You could also just use (!strcmp("body") || !strcmp("*body")) rather
>> than skipping "*" manually, but the repetition makes that a bit
>> noisier and uglier.
>>
>> [1]: http://article.gmane.org/gmane.comp.version-control.git/282645
>
> Definitely not a difficulty per se. Just that it seems like something
> match_atom_name()
> seems to be fit for. As the function name suggests that we're matching
> the atom name
> and the check for '!buf' indicates that no options are to be included
> for that particular atom.
>
> Also after Junio's suggestion[1], I think It looks better now[2]. But
> either ways, I'm not
> strongly against what you're saying, so my opinion on this matter is
> quite flexible.
>
> [1]: http://article.gmane.org/gmane.comp.version-control.git/283404
> [2]: http://article.gmane.org/gmane.comp.version-control.git/283449

Sorry, but I suspect I was looking at a leaf function without
thinking about the larger picture.

I suspect that the interface to customized parsing function called
by parse_ref_filter_atom() is misdesigned.  I understand that the
overall parsing that starts at verify_ref_format() goes like this:

 * Iterate over the string and find a matching "%(",")" pair.

   - For each such pair found, use parse_ref_filter_atom() on
     what is inside that matching pair.

     - parse_ref_filter_atom() iterates over the table of known
       atoms, and finds the entry in that table.

       Note that at this point, it knows that "%(" is followed by
       'contents' or 'contents:' when it picked the "contents" atom
       from the table, for example.

     - if the entry we found in that table for the atom being parsed
       has a custom parse function, that function is called, but the
       calling convention does not pass the fact that we already
       know what we are seeing inside "%(",")" pair is 'contents',
       for example, and we know what argument it is given if any.

So it appears to me that match_atom_name() is a misguided helper
function that you shouldn't have to use too often.  If the signature
of parse() functions is changed to take not just the atom but the
pointer to its argument (could be NULL, if we are seeing
"%(contents)", for example) that is already available as "formatp"
in the function, then contents_atom_parser() could become more like:

contents_atom_parser(struct used_atom *atom, const char *arg)
{
	if (args)
        	atom->u.contents.option = C_BARE;
	else if (!strcmp(arg, "body"))
        	atom->u.contents.option = C_BODY;
	...
}

and there is no reason for this caller to even look at atom->name or
worry about that it might have the dereferencing asterisk in front.

If we really want to avoid having separate subject_atom_parser() and
body_atom_parser(), they can be folded into the same function and it
becomes necessary to switch on atom->name like you did in the code
being discussed in the quoted part above.  For that, as Eric said,
skipping '*' manually would not be a big deal, as that should not
happen so often in the code _anyway_.  It is not a good idea to
switch on atom->name inside contents_atom_parser() like you did.
You are better off having separate {subject,body}_atom_parser()
functions.

For one thing, you are not reusing or sharing any code by squishing
these three functions into one.  A conceptually larger problem is
that you are adding two extra !strcmp() calls to figure out the
caller _already_ knows (notice I said this is "conceptual", this is
not about performance).  parse_ref_filter_atom() knows that it is a
"%(subject)" or "%(subject:...)" atom, but because you throw away
that information and call contents_atom_parser() by saying that it
is one of the contents, subject or body, the called function has to
redo strcmp in order to figure it out itself.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html