Re: [PATCH v5 2/4] lib: vsprintf: Fix handling of number field widths in vsscanf

Richard Fitzgerald <rf@xxxxxxxxxxxxxxxxxxxxx> · Mon, 8 Feb 2021 17:38:29 +0000

On 08/02/2021 15:18, Andy Shevchenko wrote:
On Mon, Feb 08, 2021 at 02:01:52PM +0000, Richard Fitzgerald wrote:
The existing code attempted to handle numbers by doing a strto[u]l(),
ignoring the field width, and then repeatedly dividing to extract the
field out of the full converted value. If the string contains a run of
valid digits longer than will fit in a long or long long, this would
overflow and no amount of dividing can recover the correct value.

This patch fixes vsscanf() to obey number field widths when parsing
the number.

A new _parse_integer_limit() is added that takes a limit for the number
of characters to parse. The number field conversion in vsscanf is changed
to use this new function.

If a number starts with a radix prefix, the field width  must be long
enough for at last one digit after the prefix. If not, it will be handled
like this:

  sscanf("0x4", "%1i", &i): i=0, scanning continues with the 'x'
  sscanf("0x4", "%2i", &i): i=0, scanning continues with the '4'

This is consistent with the observed behaviour of userland sscanf.

Note that this patch does NOT fix the problem of a single field value
overflowing the target type. So for example:

   sscanf("123456789abcdef", "%x", &i);

Will not produce the correct result because the value obviously overflows
INT_MAX. But sscanf will report a successful conversion.

I have a few nit-picks, but it's up to you and maintainers how to proceed.

...

-unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base)
+static unsigned long long simple_strntoull(const char *startp, size_t max_chars,
+					   char **endp, unsigned int base)
  {
-	unsigned long long result;
+	const char *cp;
+	unsigned long long result = 0ULL;
  	unsigned int rv;

-	cp = _parse_integer_fixup_radix(cp, &base);
-	rv = _parse_integer(cp, base, &result);
+	cp = _parse_integer_fixup_radix(startp, &base);
+	if ((cp - startp) >= max_chars) {
+		cp = startp + max_chars;
+		goto out;
+	}
+
+	max_chars -= (cp - startp);
+	rv = _parse_integer_limit(cp, base, &result, max_chars);
  	/* FIXME */
  	cp += (rv & ~KSTRTOX_OVERFLOW);

+out:
  	if (endp)
  		*endp = (char *)cp;

  	return result;
  }

A nit-pick: What if we rewrite above as

static unsigned long long simple_strntoull(const char *cp, size_t max_chars,
					   char **endp, unsigned int base)
{
	unsigned long long result = 0ULL;
	const char *startp = cp;
	unsigned int rv;
	size_t chars;

	cp = _parse_integer_fixup_radix(cp, &base);
	chars = cp - startp;
	if (chars >= max_chars) {
		/* We hit the limit */
		cp = startp + max_chars;
	} else {
		rv = _parse_integer_limit(cp, base, &result, max_chars - chars);
		/* FIXME */
		cp += (rv & ~KSTRTOX_OVERFLOW);
	}

	if (endp)
		*endp = (char *)cp;

	return result;
}

...

I don't mind rewriting that code if you prefer that way.
I am used to working on other kernel subsytems where the preference is
to bail out on the error case so that the "normal" case flows without
nesting.

+static long long simple_strntoll(const char *cp, size_t max_chars, char **endp,
+				 unsigned int base)
+{
+	/*
+	 * simple_strntoull safely handles receiving max_chars==0 in the
+	 * case we start with max_chars==1 and find a '-' prefix.

A nit-pick: Spaces surrounding '=='? simple_strntoull -> simple_strntoull()?

+	 */

Above misses to add something like:

"Otherwise we hit the '-' as an illegal number in the following
simple_strntoull() call."

+	if (*cp == '-' && max_chars > 0)
+		return -simple_strntoull(cp + 1, max_chars - 1, endp, base);
+
+	return simple_strntoull(cp, max_chars, endp, base);

+}

...

+			val.s = simple_strntoll(str,
+						field_width > 0 ? field_width : SIZE_MAX,
+						&next, base);

A nit-pick: Wouldn't be negative field_width "big enough" to just being used as

field_width is s16 so really should be sign-extended to make it "very
big". I think this would be less readable what the intention is and what
assumptions it is based on. There's a risk someone would look at

(size_t)(long)field_width

and think the (long) is redundant.
Perhaps change field_width to int? There I ask myself "if it can be an
int, why is it declared s16?" and worry there is something subtle in the
code.

My personal preference is to avoid using tricks in code that isn't time
critical.

> is? Also, is field_width == 0 should be treated as "parse to the MAX"?
>
> ...

Earlier code terminates scanning if the width parsed from the format
string is <= 0. So field_width can only be -1 or > 0 here. But now you
point it out, that test would be better as field_width >= 0 ... so
it deals with 0 if it ever happened to sneak through to here somehow.

+			val.u = simple_strntoull(str,
+						 field_width > 0 ? field_width : SIZE_MAX,
+						 &next, base);

Ditto.