From: Johannes Schindelin <johannes.schindelin@xxxxxx> As described in https://trojansource.codes/trojan-source.pdf, it is possible to abuse directional formatting (a feature of Unicode) to deceive human readers into interpreting code differently from compilers. It is highly unlikely that Git's source code wants to contain such directional formatting in the first place, so let's disallow it. Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> --- ci: disallow directional formatting I just stumbled over https://siliconangle.com/2021/11/01/trojan-source-technique-can-inject-malware-source-code-without-detection/, which details an interesting social-engineering attack: it uses directional formatting in source code to pretend to human readers that the code does something different than it actually does. It is highly unlikely that Git's source code wants to contain such directional formatting in the first place, so let's disallow it. Technically, this is not exactly -rc material, but the paper was just published, and I want us to be safe. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1071%2Fdscho%2Fcheck-for-utf-8-directional-formatting-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1071/dscho/check-for-utf-8-directional-formatting-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1071 .github/workflows/main.yml | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 6ed6a9e8076..7b4b4df03c3 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -289,6 +289,13 @@ jobs: - uses: actions/checkout@v2 - run: ci/install-dependencies.sh - run: ci/run-static-analysis.sh + - name: disallow Unicode directional formatting + run: | + # Use UTF-8-aware `printf` to feed a byte pattern to non-UTF-8-aware `git grep` + # (Ubuntu's `git grep` is compiled without support for libpcre, otherwise we + # could use `git grep -P` with the `\u` syntax). + ! LANG=C git grep -Il "$(LANG=C.UTF-8 printf \ + '\\(\u202a\\|\u202b\\|\u202c\\|\u202d\\|\u202e\\|\u2066\\|\u2067\\|\u2068\\|\u2069\\)')" sparse: needs: ci-config if: needs.ci-config.outputs.enabled == 'yes' base-commit: 0cddd84c9f3e9c3d793ec93034ef679335f35e49 -- gitgitgadget