On Mon, Nov 01 2021, Eric Wong wrote: > Konstantin Ryabitsev <konstantin@xxxxxxxxxxxxxxxxxxx> wrote: >> Hi, all: >> >> Per exhibit a, what should we do in the situation where we discover unicode >> control characters in an email? >> >> 1. Warn and strip these chars out, because they are extremely unlikely to be >> doing anything legitimate in the context of a patch (unless someone is >> sending patches for docs actually written in RTL languages) >> 2. Warn and error out, refusing to produce an mbox >> 3. Just warn and produce an mbox anyway >> >> I'd normally do #3, but with many people piping things to git-am, I'm not sure >> if it's the safest choice. >> >> Exibit a: https://lwn.net/Articles/874546/ > > +Cc: git@vger > > IMHO, defense for this belongs in git-am (which already checks > things like whitespace). It checks whitespace because that's something that's commonly a source of patch corruption. I'm not adverse to adding this to core.whitespace, but trying to catch malicious injected code seems like a rather big expansion of its scope, particularly since: "[...]sending patches for docs actually written in RTL languages[...]" Or just code? People write comment and even in their native languages, and not all projects are as anglo-centric as those hosted on kernel.org. I haven't checked what the overlap is between solving this issue & i18n support, but we definitely should not be assuming that git's only using by kernel.org users & similar, even something as relatively obscure as git-am.