Re: [PATCH] apply: avoid out-of-bounds access in fuzzy_matchlines()

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 12 Nov 2017 13:45:47 +0900

René Scharfe <l.s.r@xxxxxx> writes:

> fuzzy_matchlines() uses a pointers to the first and last characters of
> two lines to keep track while matching them.  This makes it impossible
> to deal with empty strings.  It accesses characters before the start of
> empty lines.  It can also access characters after the end when checking
> for trailing whitespace in the main loop.
>
> Avoid that by using pointers to the first character and the one *after*
> the last one.  This is well-defined as long as the latter is not
> dereferenced.  Basically rewrite the function based on that premise; it
> becomes much simpler as a result.  There is no need to check for
> leading whitespace outside of the main loop anymore.

I recall vaguely that we were bitten by a bug or two due to another
instance of <begin,end> that deviates from the usual "close on the
left end, open on the right end" convention somewhere in the system
recently?

I think the fix of the function is correct, but at the same time, we
would want to clean it up after this fix lands by replacing the
function with the line comparison function we already have in the
xdiff/ layer, so that we can (1) reduce the code duplication and (2)
more importantly, do not have to be constrained by the (mistakenly
narrow) policy decision we currently seem to have to support only
"ignore-whitespace-change" and nothing else.  Of course, that should
not be done as part of this fix.  It is strictly a #leftoverbits item.

Thanks.

> Reported-by: Mahmoud Al-Qudsi <mqudsi@xxxxxxxxxxxx>
> Signed-off-by: Rene Scharfe <l.s.r@xxxxxx>
> ---
>  apply.c | 59 ++++++++++++++++++++---------------------------------------
>  1 file changed, 20 insertions(+), 39 deletions(-)
>
> diff --git a/apply.c b/apply.c
> index d676debd59..b8087bd29c 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -300,52 +300,33 @@ static uint32_t hash_line(const char *cp, size_t len)
>  static int fuzzy_matchlines(const char *s1, size_t n1,
>  			    const char *s2, size_t n2)
>  {
> -	const char *last1 = s1 + n1 - 1;
> -	const char *last2 = s2 + n2 - 1;
> -	int result = 0;
> +	const char *end1 = s1 + n1;
> +	const char *end2 = s2 + n2;
>  
>  	/* ignore line endings */
> -	while ((*last1 == '\r') || (*last1 == '\n'))
> -		last1--;
> -	while ((*last2 == '\r') || (*last2 == '\n'))
> -		last2--;
> -
> -	/* skip leading whitespaces, if both begin with whitespace */
> -	if (s1 <= last1 && s2 <= last2 && isspace(*s1) && isspace(*s2)) {
> -		while (isspace(*s1) && (s1 <= last1))
> -			s1++;
> -		while (isspace(*s2) && (s2 <= last2))
> -			s2++;
> -	}
> -	/* early return if both lines are empty */
> -	if ((s1 > last1) && (s2 > last2))
> -		return 1;
> -	while (!result) {
> -		result = *s1++ - *s2++;
> -		/*
> -		 * Skip whitespace inside. We check for whitespace on
> -		 * both buffers because we don't want "a b" to match
> -		 * "ab"
> -		 */
> -		if (isspace(*s1) && isspace(*s2)) {
> -			while (isspace(*s1) && s1 <= last1)
> +	while (s1 < end1 && (end1[-1] == '\r' || end1[-1] == '\n'))
> +		end1--;
> +	while (s2 < end2 && (end2[-1] == '\r' || end2[-1] == '\n'))
> +		end2--;
> +
> +	while (s1 < end1 && s2 < end2) {
> +		if (isspace(*s1)) {
> +			/*
> +			 * Skip whitespace. We check on both buffers
> +			 * because we don't want "a b" to match "ab".
> +			 */
> +			if (!isspace(*s2))
> +				return 0;
> +			while (s1 < end1 && isspace(*s1))
>  				s1++;
> -			while (isspace(*s2) && s2 <= last2)
> +			while (s2 < end2 && isspace(*s2))
>  				s2++;
> -		}
> -		/*
> -		 * If we reached the end on one side only,
> -		 * lines don't match
> -		 */
> -		if (
> -		    ((s2 > last2) && (s1 <= last1)) ||
> -		    ((s1 > last1) && (s2 <= last2)))
> +		} else if (*s1++ != *s2++)
>  			return 0;
> -		if ((s1 > last1) && (s2 > last2))
> -			break;
>  	}
>  
> -	return !result;
> +	/* If we reached the end on one side only, lines don't match. */
> +	return s1 == end1 && s2 == end2;
>  }
>  
>  static void add_line_info(struct image *img, const char *bol, size_t len, unsigned flag)