Re: [PATCH 4/4] config: avoid calling `labs()` on too-large data type

René Scharfe <l.s.r@xxxxxx> · Sat, 22 Jun 2019 12:03:24 +0200

Am 21.06.19 um 20:35 schrieb Johannes Schindelin:
> Hi René,
>
> On Thu, 20 Jun 2019, René Scharfe wrote:
>
>> Subject: [PATCH] config: simplify unit suffix handling
>>
>> parse_unit_factor() checks if a K, M or G is present after a number and
>> multiplies it by 2^10, 2^20 or 2^30, respectively.  One of its callers
>> checks if the result is smaller than the number alone to detect
>> overflows.  The other one passes 1 as the number and does multiplication
>> and overflow check itself in a similar manner.
>>
>> This works, but is inconsistent, and it would break if we added support
>> for a bigger unit factor.  E.g. 16777217T expands to 2^64 + 2^40, which
>> is too big for a 64-bit number.  Modulo 2^64 we get 2^40 == 1TB, which
>> is bigger than the raw number 16777217 == 2^24 + 1, so the overflow
>> would go undetected by that method.
>>
>> Move the multiplication out of parse_unit_factor() and rename it to
>> get_unit_factor() to signify its reduced functionality.  This partially
>
> I do not necessarily think that the name `get_unit_factor()` is better,
> given that we still parse the unit factor. I'd vote for keeping the
> original name.

get_unit_factor() is the original name from before c8deb5a146.

> However, what _does_ make sense is to change that function to _really_
> only parse the unit factor. That is, I would keep the exact signature, I
> just would not multiply `*val` by the unit factor, I would overwrite it by
> the unit factor instead.

So the patch is too big.  Its narrative of "let's restore the original
code, but keep the good features added since" is not carrying the
weight of its many changes.

> At least that is what I would have expected, reading the name
> `parse_unit_factor()`.

Hence the renaming. :)

When I read parse_unit_factor() without any context then I expect it to
work in the middle of a string, telling the caller how many characters
were recognized.  It would then be usable with different units, e.g.
for "17KB" just as well as for "100Mbps".

We don't need such a generic function here, of course.

>> reverts c8deb5a146 ("Improve error messages when int/long cannot be
>> parsed from config", 2007-12-25), but keeps the improved error messages.
>> Use a return value of 0 to signal an invalid suffix.
>
> This comment should probably become a code comment above the function.

You mean just the last sentence, right?  For an exported function I'd
agree, but for this short helper I'm not so sure and would rather not
bother the reader with easily inferable facts.

>> Signed-off-by: Rene Scharfe <l.s.r@xxxxxx>
>
> What, no accent?

I prefer a recognizable simplified version to a butchered one.  Perhaps
the world is ready for Unicode now?  I still get weirdly transformed
characters on letters and parcels, so I'm cautious.  Testing the waters
with the sender name setting in my MUA for some time now..

>> diff --git a/config.c b/config.c
>> index 01c6e9df23..61a8bbb5cd 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -834,51 +834,46 @@ static int git_parse_source(config_fn_t fn, void *data,
>>  	return error_return;
>>  }
>>
>> -static int parse_unit_factor(const char *end, uintmax_t *val)
>> +static uintmax_t get_unit_factor(const char *end)
>
> It has been a historical wart that the parameter was called `end`. Maybe
> that could be fixed, "while at it"?

I was tempted to do that, and am a bit proud of having resisted that
one.  I try to avoid "what at it" these days -- if it's worth doing
that other thing then it can live in its own patch.

But the name "end" is arguably good, as it signifies that the function
only works with unit factors at the end of strings.

>>
>>  		errno = 0;
>>  		val = strtoumax(value, &end, 0);
>>  		if (errno == ERANGE)
>>  			return 0;
>> -		oldval = val;
>> -		if (!parse_unit_factor(end, &val)) {
>> +		factor = get_unit_factor(end);
>> +		if (!factor) {
>
> Again, here I would strongly suggest the less intrusive change (with a
> more intuitive outcome):
>
> -		oldval = val;
> -		if (!parse_unit_factor(end, &val)) {
> +		if (!parse_unit_factor(end, &factor)) {
>
>>  			errno = EINVAL;
>>  			return 0;
>>  		}
>> -		if (val > max || oldval > val) {
>> +		if (unsigned_mult_overflows(factor, val) ||
>> +		    factor * val > max) {

I'll split that out, then we can discuss it separately.

René