Re: [PATCH 01/14] numparse: new module for parsing integral numbers

Eric Sunshine <sunshine@xxxxxxxxxxxxxx> · Wed, 18 Mar 2015 14:27:56 -0400

On Tuesday, March 17, 2015, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
> Implement wrappers for strtol() and strtoul() that are safer and more
> convenient to use.
>
> Signed-off-by: Michael Haggerty <mhagger@xxxxxxxxxxxx>
> ---
> diff --git a/numparse.c b/numparse.c
> new file mode 100644
> index 0000000..90b44ce
> --- /dev/null
> +++ b/numparse.c
> @@ -0,0 +1,180 @@
> +int parse_l(const char *s, unsigned int flags, long *result, char **endptr)
> +{
> +       long l;
> +       const char *end;
> +       int err = 0;
> +
> +       err = parse_precheck(s, &flags);
> +       if (err)
> +               return err;
> +       /*
> +        * Now let strtol() do the heavy lifting:
> +        */

Perhaps use a /* one-line style comment */ to reduce vertical space
consumption a bit, thus make it (very slightly) easier to run the eye
over the code.

> +       errno = 0;
> +       l = strtol(s, (char **)&end, flags & NUM_BASE_MASK);
> +       if (errno) {
> +               if (errno == ERANGE) {
> +                       if (!(flags & NUM_SATURATE))
> +                               return -NUM_SATURATE;
> +               } else {
> +                       return -NUM_OTHER_ERROR;
> +               }
> +       }

Would it reduce cognitive load slightly (and reduce vertical space
consumption) to rephrase the conditionals as:

    if (errno == ERANGE && !(flags & NUM_SATURATE))
        return -NUM_SATURATE;

    if (errno && errno != ERANGE)
        return -NUM_OTHER_ERROR;

or something similar?

More below.

> +       if (end == s)
> +               return -NUM_NO_DIGITS;
> +
> +       if (*end && !(flags & NUM_TRAILING))
> +               return -NUM_TRAILING;
> +
> +       /* Everything was OK */
> +       *result = l;
> +       if (endptr)
> +               *endptr = (char *)end;
> +       return 0;
> +}
> diff --git a/numparse.h b/numparse.h
> new file mode 100644
> index 0000000..4de5e10
> --- /dev/null
> +++ b/numparse.h
> @@ -0,0 +1,207 @@
> +#ifndef NUMPARSE_H
> +#define NUMPARSE_H
> +
> +/*
> + * Functions for parsing integral numbers.
> + *
> + * Examples:
> + *
> + * - Convert s into a signed int, interpreting prefix "0x" to mean
> + *   hexadecimal and "0" to mean octal. If the value doesn't fit in an
> + *   unsigned int, set result to INT_MIN or INT_MAX.

Did you mean s/unsigned int/signed int/ ?

> + *
> + *     if (convert_i(s, NUM_SLOPPY, &result))
> + *             die("...");
> + */
> +
> +/*
> + * The lowest 6 bits of flags hold the numerical base that should be
> + * used to parse the number, 2 <= base <= 36. If base is set to 0,
> + * then NUM_BASE_SPECIFIER must be set too; in this case, the base is
> + * detected automatically from the string's prefix.

Does this restriction go against the goal of making these functions
convenient, even while remaining strict? Is there a strong reason for
not merely inferring NUM_BASE_SPECIFIER when base is 0? Doing so would
make it consistent with strto*l() without (I think) introducing any
ambiguity.

> + */
> +/*
> + * Number parsing functions:
> + *
> + * The following functions parse a number (long, unsigned long, int,
> + * or unsigned int respectively) from the front of s, storing the
> + * value to *result and storing a pointer to the first character after
> + * the number to *endptr. flags specifies how the number should be
> + * parsed, including which base should be used. flags is a combination
> + * of the numerical base (2-36) and the NUM_* constants above (see).

"(see)" what?

> + * Return 0 on success or a negative value if there was an error. On
> + * failure, *result and *entptr are left unchanged.
> + *
> + * Please note that if NUM_TRAILING is not set, then it is
> + * nevertheless an error if there are any characters between the end
> + * of the number and the end of the string.

Again, on the subject of convenience, why this restriction? The stated
purpose of the parse_*() functions is to parse the number from the
front of the string and return a pointer to the first non-numeric
character following. As  a reader of this API, I interpret that as
meaning that NUM_TRAILING is implied. Is there a strong reason for not
inferring NUM_TRAILING for the parse_*() functions at the API level?
(I realize that the convert_*() functions are built atop parse_*(),
but that's an implementation detail.)

> + */
> +
> +int parse_l(const char *s, unsigned int flags,
> +           long *result, char **endptr);

Do we want to perpetuate the ugly (char **) convention for 'endptr'
from strto*l()? Considering that the incoming string is const, it
seems undesirable to return a non-const pointer to some place inside
that string.

> +/*
> + * Number conversion functions:
> + *
> + * The following functions parse a string into a number. They are
> + * identical to the parse_*() functions above, except that the endptr
> + * is not returned. These are most useful when parsing a whole string
> + * into a number; i.e., when (flags & NUM_TRAILING) is unset.

I can formulate arguments for allowing or disallowing NUM_TRAILING
with convert_*(), however, given their purpose of parsing the entire
string into a number, as a reader of the API, I would kind of expect
the convert_*() functions to ensure that NUM_TRAILING is not set
(either by forcibly clearing it or erroring out as an inconsistent
request if it is set).

> + */
> +static inline int convert_l(const char *s, unsigned int flags,
> +                           long *result)
> +{
> +       return parse_l(s, flags, result, NULL);
> +}
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html