Re: [PATCH 00/14] numparse module: systematically tighten up integer parsing

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Tue, 24 Mar 2015 17:06:12 +0100

On 03/19/2015 08:32 AM, Junio C Hamano wrote:
> Jeff King <peff@xxxxxxxx> writes:
> 
>> I wonder how much of the boilerplate in the parse_* functions could be
>> factored out to use a uintmax_t, with the caller just providing the
>> range. That would make it easier to add new types like off_t, and
>> possibly even constrained types (e.g., an integer from 0 to 100). On the
>> other hand, you mentioned to me elsewhere that there may be some bugs in
>> the range-checks of config.c's integer parsing. I suspect they are
>> related to exactly this kind of refactoring, so perhaps writing things
>> out is best.
> 
> I like this idea very well.  I wonder if we can implement the family
> of
> 
>     parse_{type}(const char *, unsigned int flags,
>     		 const char **endptr, {type} *result)
> 
> functions by calls a helper that internally deals with the numbers
> in uintmax_t, and then checks if the value fits within the possible
> range of the *result before returning.
> 
>     int parse_i(const char *str, unsigned flags,
> 		const char **endptr, int *result) {
> 	uintmax_t val;
>         int sign = 1, status;
>         if (*str == '-') {
> 		sign = -1; 
>                 str++;
> 	}
>         status = parse_num(str, flags, endptr, &val, INT_MAX);
>         if (status < 0)
>         	return status;
> 	*result = sign < 0 ? -val : val;
>         return 0;
>     }
> 
> (assuming the last parameter to parse_num is used to check if the
> result fits within that range).  Or that may be easier and cleaner
> to be done in the caller with or something like that:
> 
> 	status = parse_num(str, flags, endptr, &val);
>         if (status < 0)
>         	return status;
> 	if (INT_MAX <= val * sign || val * sign <= INT_MIN) {
>         	errno = ERANGE;
>                 return -1;
> 	}
> 
> If we wanted to, we may even be able to avoid duplication of
> boilerplate by wrapping the above pattern within a macro,
> parameterizing the TYPE_{MIN,MAX} and using token pasting, to
> expand to four necessary result types.
> 
> There is no reason for the implementation of the parse_num() helper
> to be using strtoul(3) or strtoull(3); its behaviour will be under
> our total control.  It can become fancier by enriching the flags
> bits (e.g. allow scaling factor, etc.) only once and all variants
> for various result types will get the same feature.

Parsing numbers is not rocket science, but there are a lot of pitfalls,
especially around overflow. It's even harder to write such code via
macros and the result is less readable.

This patch series is mostly about finding a reasonable API and whipping
the callers into shape. That seems ambitious enough for me. I'd rather
stick with boring wrappers for now and lean on strtol()/strtoul() to do
the dirty work. It will be easy for a motivated person to change the
implementation later.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html