Re: b4: unicode control characters -- warn or remove?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 01, 2021 at 09:02:34PM +0100, Ævar Arnfjörð Bjarmason wrote:
> It checks whitespace because that's something that's commonly a source
> of patch corruption. I'm not adverse to adding this to core.whitespace,
> but trying to catch malicious injected code seems like a rather big
> expansion of its scope, particularly since:
> 
>     "[...]sending patches for docs actually written in RTL languages[...]"
> 
> Or just code? People write comment and even in their native languages,
> and not all projects are as anglo-centric as those hosted on kernel.org.

My comment about docs was purely within the scope of the Linux kernel.

I think the following would be a sane check:

1. are there unicode control characters (CCs) present?
2. are there other characters from RTL languages present in the same line?

if both 1 && 2 are true, this is a legitimate use of Unicode CCs. If only 1 is
true, then it's likely worth a warning.

Maybe even relax #2 to just check for unicode characters above a certain
barrier where RTL languages live. I think everyone will agree that if there
are unicode CCs and no other unicode characters in that same line, it's likely
not a legitimate use of control characters.

-K



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux