Re: [PATCH v2 1/2] commit: reject invalid UTF-8 codepoints

Torsten Bögershausen <tboegi@xxxxxx> · Thu, 04 Jul 2013 21:58:08 +0200

On 2013-07-04 19.19, brian m. carlson wrote:
> The commit code already contains code for validating UTF-8, but it does not
> check for invalid values, such as guaranteed non-characters and surrogates.  Fix
s/guaranteed non-characters/code points out of range/
> this by explicitly checking for and rejecting such characters.
Do we really reject them, or do we (only) warn about them ? 

Other question:
Now that we have a check for codepoints out of range, beyond U+10FFFF,
do we want to have an additional testcase ?

> +test_expect_success 'UTF-8 invalid characters refused' '
May be:
 test_expect_success 'UTF-8 invalid surrogate' '

> +	test_when_finished "rm -f $HOME/stderr $HOME/invalid" && 
> +	rm -f "$HOME/stderr" &&
> +	echo "UTF-8 characters" >F &&
> +	printf "Commit message\n\nInvalid surrogate:\355\240\200\n" \
> +		>"$HOME/invalid" &&
good
> +	git commit -a -F "$HOME/invalid" \
> +		2>"$HOME"/stderr &&
> +	grep "did not conform" "$HOME"/stderr
> +'
> +
> +rm -f "$HOME/stderr"
Does it make sense to "grep on the fly", like this:
git commit -a -F "$HOME/invalid" 2>&1  | grep "did not conform"

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html