On Mon, Nov 01, 2021 at 09:02:34PM +0100, Ævar Arnfjörð Bjarmason wrote: > It checks whitespace because that's something that's commonly a source > of patch corruption. I'm not adverse to adding this to core.whitespace, > but trying to catch malicious injected code seems like a rather big > expansion of its scope, particularly since: > > "[...]sending patches for docs actually written in RTL languages[...]" > > Or just code? People write comment and even in their native languages, > and not all projects are as anglo-centric as those hosted on kernel.org. My comment about docs was purely within the scope of the Linux kernel. I think the following would be a sane check: 1. are there unicode control characters (CCs) present? 2. are there other characters from RTL languages present in the same line? if both 1 && 2 are true, this is a legitimate use of Unicode CCs. If only 1 is true, then it's likely worth a warning. Maybe even relax #2 to just check for unicode characters above a certain barrier where RTL languages live. I think everyone will agree that if there are unicode CCs and no other unicode characters in that same line, it's likely not a legitimate use of control characters. -K