On Wed, Sep 11, 2024 at 03:32:46PM -0700, Junio C Hamano wrote: > I refrain from merging my own patches until somebody else at least > comments on them, but I'll make an exception and merge this to 'next', > as a few "container" jobs _always_ fail to start otherwise. At least > with the step removed, a run without any test failures would tell us > that these container tasks are OK, which is much better. > > If somebody finds a better solution (i.e., a way to extract trash > directories of failed tests', with actions/upload-artifact@v1), that > can be added later on top. I looked at this a bit last night, but as I didn't come up with any useful solution, I refrained from replying. I was going to do a brief write-up of all of my dead ends, but after banging my head against the wall a bit more, I think I might actually have something. ;) I think a better path forward is to figure out how to keep up to date with the upload-artifact action for all jobs. The root of the issue is that all of the runner images are 64-bit, and the actions aren't prepared to run inside a 32-bit container. They're written in javascript and ship with their own node.js, but it doesn't work inside the container due to dynamic linking issues. So you probably saw a message like this: exec /__e/node20/bin/node: no such file or directory in the job log. That funky path is the node binary that ships with the action, and the ENOENT is because it can't find the runtime linker. Aside: I think the world would be a better place if they shipped a totally static build of node, but there may be some complications there. I found some discussion here: https://github.com/actions/setup-node/issues/922 https://github.com/actions/runner/issues/2906 There are two options here, I think: 1. Instead of running everything inside the container, we can run the actions outside (on the normal 64-bit runner image), and just do the build/test steps inside the container by manually running docker with the appropriate mounts. This was the suggested here: https://github.com/actions/runner/issues/1491#issuecomment-970418408 and apparently is how libgit2 works (I don't think it's too hard to do so, but their infrastructure wasn't totally trivial). 2. After some tricky debugging via tmate[1], I found that we can install the necessary libc bits within the container. But there's another catch! They've moved to node20, which requires glibc2.28, and the eight-year-old xenial release we're using is too old for that. We have to move to Focal, which was released in 2020. It feels like (1) is probably the more robust and flexible solution overall. But we can get to (2) a lot more easily from where we are now. Doing this: diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index cbdb677258..6b45dcad9d 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -347,8 +347,8 @@ jobs: image: alpine distro: alpine-latest - jobname: linux32 - image: daald/ubuntu32:xenial - distro: ubuntu32-16.04 + image: i386/ubuntu:focal + distro: ubuntu32-20.04 - jobname: pedantic image: fedora distro: fedora-latest @@ -358,27 +358,21 @@ jobs: runs-on: ubuntu-latest container: ${{matrix.vector.image}} steps: - - uses: actions/checkout@v4 - if: matrix.vector.jobname != 'linux32' - - uses: actions/checkout@v1 # cannot be upgraded because Node.js Actions aren't supported in this container + - name: prepare libc6 for actions + run: apt update && apt install -y libc6-amd64 lib64stdc++6 if: matrix.vector.jobname == 'linux32' + - uses: actions/checkout@v4 - run: ci/install-dependencies.sh - run: ci/run-build-and-tests.sh - name: print test failures if: failure() && env.FAILED_TEST_ARTIFACTS != '' run: ci/print-test-failures.sh - name: Upload failed tests' directories - if: failure() && env.FAILED_TEST_ARTIFACTS != '' && matrix.vector.jobname != 'linux32' + if: failure() && env.FAILED_TEST_ARTIFACTS != '' uses: actions/upload-artifact@v4 with: name: failed-tests-${{matrix.vector.jobname}} path: ${{env.FAILED_TEST_ARTIFACTS}} - - name: Upload failed tests' directories - if: failure() && env.FAILED_TEST_ARTIFACTS != '' && matrix.vector.jobname == 'linux32' - uses: actions/upload-artifact@v1 # cannot be upgraded because Node.js Actions aren't supported in this container - with: - name: failed-tests-${{matrix.vector.jobname}} - path: ${{env.FAILED_TEST_ARTIFACTS}} static-analysis: needs: ci-config if: needs.ci-config.outputs.enabled == 'yes' lets us eliminate the special cases and just work with the regular versions of each action. That gets us through the checkout action. There does seem to be some fallout from using the more recent image. Now running "linux32" to change the machine personality in ci/install-dependencies runs afoul of some seccomp restrictions. Loosening docker like this works: diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 6b45dcad9d..ed66c0bea4 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -356,7 +356,9 @@ jobs: jobname: ${{matrix.vector.jobname}} distro: ${{matrix.vector.distro}} runs-on: ubuntu-latest - container: ${{matrix.vector.image}} + container: + image: ${{matrix.vector.image}} + options: --security-opt=seccomp=unconfined steps: - name: prepare libc6 for actions run: apt update && apt install -y libc6-amd64 lib64stdc++6 but it's not at all clear to me why we need to run "linux32" in the first place. Sure, it's a 64-bit kernel still (we're just in a userspace container) but the system knows its 32-bit, and stuff like "getconf LONG_BIT" returns 32. So another solution may be just: diff --git a/ci/install-dependencies.sh b/ci/install-dependencies.sh index 2190c82313..8a8b832a35 100755 --- a/ci/install-dependencies.sh +++ b/ci/install-dependencies.sh @@ -60,11 +60,9 @@ ubuntu-*) chmod a+x "$CUSTOM_PATH/jgit" ;; ubuntu32-*) - sudo linux32 --32bit i386 sh -c ' - apt update >/dev/null && - apt install -y build-essential libcurl4-openssl-dev \ - libssl-dev libexpat-dev gettext python >/dev/null - ' + sudo apt update >/dev/null && + sudo apt install -y build-essential libcurl4-openssl-dev \ + libssl-dev libexpat-dev gettext python >/dev/null ;; macos-*) export HOMEBREW_NO_AUTO_UPDATE=1 HOMEBREW_NO_INSTALL_CLEANUP=1 but I'm worried I'm missing something, as it's been a while since I poked at multi-arch stuff. After that, we need this: diff --git a/ci/install-dependencies.sh b/ci/install-dependencies.sh index 4781cd20bb..2190c82313 100755 --- a/ci/install-dependencies.sh +++ b/ci/install-dependencies.sh @@ -22,6 +22,9 @@ then } fi +# Required so that apt doesn't wait for user input on certain packages. +export DEBIAN_FRONTEND=noninteractive + case "$distro" in alpine-*) apk add --update shadow sudo build-base curl-dev openssl-dev expat-dev gettext \ @@ -34,8 +37,6 @@ fedora-*) dnf -yq install make gcc findutils diffutils perl python3 gettext zlib-devel expat-devel openssl-devel curl-devel pcre2-devel >/dev/null ;; ubuntu-*) - # Required so that apt doesn't wait for user input on certain packages. - export DEBIAN_FRONTEND=noninteractive sudo apt-get -q update sudo apt-get -q -y install \ to cover the ubuntu32 job (I guess it just wasn't needed on the ancient image we were using). After that, we get all the way to the actual build. Looks like it fails because we're missing zlib. Presumably that was included by default in the old image, but not the new, and it needs to be added to package install list. Taken together, I suspect we could just treat "ubuntu32" the same as "ubuntu" in the ci/install-dependencies script. That would fix all of those issues, and simplify the script. Assuming I'm not missing something. -Peff [1] Using tmate to diagnose failing "node" proved a bit tricky, because the tmate action also uses node, so it fails, too! For future reference, this is what I used to get it running manually: - name: Setup tmate if: always() run: | apt update && apt install -y xz-utils wget && v=tmate-2.4.0-static-linux-i386 && wget https://github.com/tmate-io/tmate/releases/download/2.4.0/$v.tar.xz && xzcat $v.tar.xz | tar xvvf - && cd $v && ./tmate -F