Re: [PATCH v4 2/2] sequencer: fix quoting in write_author_script

Eric Sunshine <sunshine@xxxxxxxxxxxxx> · Wed, 8 Aug 2018 04:43:43 -0400

On Tue, Aug 7, 2018 at 9:54 AM Phillip Wood <phillip.wood@xxxxxxxxxxxx> wrote:
> On 07/08/18 11:23, Eric Sunshine wrote:
> > On Tue, Aug 7, 2018 at 5:35 AM Phillip Wood <phillip.wood@xxxxxxxxxxxx> wrote:
> >> +       if (n > 0 && s[n] != '\'')
> >> +               return 1;
> >
> > To be "technically correct", I think the condition in the 'if'
> > statement should be ">=". It should never happen in practice that the
> > entire content of the file is a single character followed by zero or
> > more newlines, but using the proper condition ">=" would save future
> > readers of this code a "huh?" moment.
>
> I'm not sure it is that simple. If the script consists solely of a
> single quote then we should return 1, if it is a single character that
> is not "'" then it should return 0. Currently it returns 0 in both those
> cases so is technically broken when the script is "'". If it used ">="
> instead then I think it would return 0 when it should return 1 and vice
> versa. As you say this shouldn't happen in practice.

It shouldn't happen in practice, but if a human (power user) edits
this file, we shouldn't discount the possibility. However, I'm not so
concerned about quoting_is_broken() returning a meaningful answer in
this case since we have much bigger problems if we get such a file.
(Indeed, what answer could quoting_is_broken() return which would be
useful or meaningful for such a malformed file?)

What does concern me is that read_env_script() doesn't seem to care
about such a malformed file; it doesn't do any validation at all.
Contrast that with read_author_ident() which is pretty strict about
the content it expects to find in the file. So, it might make sense to
upgrade read_env_script() to do some sort of validation on each line
(though that shouldn't be in this patch, and doesn't even need to be
in this series). For instance, it could check that each line looks
something like what would be matched by this regex: /[A-Z0-9_]+='.+'/

(And, no, I'm not saying that regex should be used for validation; I'm
just using it as an example.)

As for '>' vs. '>=', it caused more than a slight hiccup when I was
scanning the code, and I worry that future readers could be similarly
impacted.