On Tue, Nov 02 2021, Johannes Schindelin via GitGitGadget wrote: > From: Johannes Schindelin <johannes.schindelin@xxxxxx> > > As described in https://trojansource.codes/trojan-source.pdf, it is > possible to abuse directional formatting (a feature of Unicode) to > deceive human readers into interpreting code differently from compilers. > > It is highly unlikely that Git's source code wants to contain such > directional formatting in the first place, so let's disallow it. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> > --- > ci: disallow directional formatting > > I just stumbled over > https://siliconangle.com/2021/11/01/trojan-source-technique-can-inject-malware-source-code-without-detection/, > which details an interesting social-engineering attack: it uses > directional formatting in source code to pretend to human readers that > the code does something different than it actually does. > > It is highly unlikely that Git's source code wants to contain such > directional formatting in the first place, so let's disallow it. > > Technically, this is not exactly -rc material, but the paper was just > published, and I want us to be safe. There's a parallel discussion about doing something to detect this in "git am", which for the git project seems like a better place to put this. > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1071%2Fdscho%2Fcheck-for-utf-8-directional-formatting-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1071/dscho/check-for-utf-8-directional-formatting-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1071 > > .github/workflows/main.yml | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml > index 6ed6a9e8076..7b4b4df03c3 100644 > --- a/.github/workflows/main.yml > +++ b/.github/workflows/main.yml > @@ -289,6 +289,13 @@ jobs: > - uses: actions/checkout@v2 > - run: ci/install-dependencies.sh > - run: ci/run-static-analysis.sh > + - name: disallow Unicode directional formatting > + run: | > + # Use UTF-8-aware `printf` to feed a byte pattern to non-UTF-8-aware `git grep` > + # (Ubuntu's `git grep` is compiled without support for libpcre, otherwise we > + # could use `git grep -P` with the `\u` syntax). > + ! LANG=C git grep -Il "$(LANG=C.UTF-8 printf \ > + '\\(\u202a\\|\u202b\\|\u202c\\|\u202d\\|\u202e\\|\u2066\\|\u2067\\|\u2068\\|\u2069\\)')" > sparse: > needs: ci-config > if: needs.ci-config.outputs.enabled == 'yes' > > base-commit: 0cddd84c9f3e9c3d793ec93034ef679335f35e49 It would be easier to maintain this if were added to ci/run-static-analysis.sh instead, where we have some similar tests, and if it lives there we could do away with the double-escaping, then it can also be run manually. Also, can't we just pipe "git ls-files -z" into "perl -0ne" here and use its unconditional support for e.g. unicode properties in regexes. How will this change impact RTL languages being added to po/?