Re: [PATCH] ci: disallow directional formatting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 02 2021, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@xxxxxx>
>
> As described in https://trojansource.codes/trojan-source.pdf, it is
> possible to abuse directional formatting (a feature of Unicode) to
> deceive human readers into interpreting code differently from compilers.
>
> It is highly unlikely that Git's source code wants to contain such
> directional formatting in the first place, so let's disallow it.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
> ---
>     ci: disallow directional formatting
>     
>     I just stumbled over
>     https://siliconangle.com/2021/11/01/trojan-source-technique-can-inject-malware-source-code-without-detection/,
>     which details an interesting social-engineering attack: it uses
>     directional formatting in source code to pretend to human readers that
>     the code does something different than it actually does.
>     
>     It is highly unlikely that Git's source code wants to contain such
>     directional formatting in the first place, so let's disallow it.
>     
>     Technically, this is not exactly -rc material, but the paper was just
>     published, and I want us to be safe.

There's a parallel discussion about doing something to detect this in
"git am", which for the git project seems like a better place to put
this.

> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1071%2Fdscho%2Fcheck-for-utf-8-directional-formatting-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1071/dscho/check-for-utf-8-directional-formatting-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1071
>
>  .github/workflows/main.yml | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
> index 6ed6a9e8076..7b4b4df03c3 100644
> --- a/.github/workflows/main.yml
> +++ b/.github/workflows/main.yml
> @@ -289,6 +289,13 @@ jobs:
>      - uses: actions/checkout@v2
>      - run: ci/install-dependencies.sh
>      - run: ci/run-static-analysis.sh
> +    - name: disallow Unicode directional formatting
> +      run: |
> +        # Use UTF-8-aware `printf` to feed a byte pattern to non-UTF-8-aware `git grep`
> +        # (Ubuntu's `git grep` is compiled without support for libpcre, otherwise we
> +        # could use `git grep -P` with the `\u` syntax).
> +        ! LANG=C git grep -Il "$(LANG=C.UTF-8 printf \
> +          '\\(\u202a\\|\u202b\\|\u202c\\|\u202d\\|\u202e\\|\u2066\\|\u2067\\|\u2068\\|\u2069\\)')"
>    sparse:
>      needs: ci-config
>      if: needs.ci-config.outputs.enabled == 'yes'
>
> base-commit: 0cddd84c9f3e9c3d793ec93034ef679335f35e49

It would be easier to maintain this if were added to
ci/run-static-analysis.sh instead, where we have some similar tests, and
if it lives there we could do away with the double-escaping, then it can
also be run manually.

Also, can't we just pipe "git ls-files -z" into "perl -0ne" here and use
its unconditional support for e.g. unicode properties in regexes.

How will this change impact RTL languages being added to po/?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux