Re: [PATCH v11 14/14] selftests/sgx: Add scripts for EPC cgroup testing

"Haitao Huang" <haitao.huang@xxxxxxxxxxxxxxx> · Sun, 14 Apr 2024 22:13:15 -0500

On Sun, 14 Apr 2024 10:01:03 -0500, Jarkko Sakkinen <jarkko@xxxxxxxxxx>  
wrote:

On Wed Apr 10, 2024 at 9:25 PM EEST, Haitao Huang wrote:
To run selftests for EPC cgroup:

sudo ./run_epc_cg_selftests.sh

To watch misc cgroup 'current' changes during testing, run this in a
separate terminal:

./watch_misc_for_tests.sh current

With different cgroups, the script starts one or multiple concurrent SGX
selftests (test_sgx), each to run the unclobbered_vdso_oversubscribed
test case, which loads an enclave of EPC size equal to the EPC capacity
available on the platform. The script checks results against the
expectation set for each cgroup and reports success or failure.

The script creates 3 different cgroups at the beginning with following
expectations:

1) SMALL - intentionally small enough to fail the test loading an
enclave of size equal to the capacity.
2) LARGE - large enough to run up to 4 concurrent tests but fail some if
more than 4 concurrent tests are run. The script starts 4 expecting at
least one test to pass, and then starts 5 expecting at least one test
to fail.
3) LARGER - limit is the same as the capacity, large enough to run lots  
of
concurrent tests. The script starts 8 of them and expects all pass.
Then it reruns the same test with one process randomly killed and
usage checked to be zero after all processes exit.

The script also includes a test with low mem_cg limit and LARGE sgx_epc
limit to verify that the RAM used for per-cgroup reclamation is charged
to a proper mem_cg. For this test, it turns off swapping before start,
and turns swapping back on afterwards.

Signed-off-by: Haitao Huang <haitao.huang@xxxxxxxxxxxxxxx>
---
V11:
- Remove cgroups-tools dependency and make scripts ash compatible.  
(Jarkko)
- Drop support for cgroup v1 and simplify. (Michal, Jarkko)
- Add documentation for functions. (Jarkko)
- Turn off swapping before memcontrol tests and back on after
- Format and style fixes, name for hard coded values

V7:
- Added memcontrol test.

V5:
- Added script with automatic results checking, remove the interactive
script.
- The script can run independent from the series below.
---
 tools/testing/selftests/sgx/ash_cgexec.sh     |  16 +
 .../selftests/sgx/run_epc_cg_selftests.sh     | 275 ++++++++++++++++++
 .../selftests/sgx/watch_misc_for_tests.sh     |  11 +
 3 files changed, 302 insertions(+)
 create mode 100755 tools/testing/selftests/sgx/ash_cgexec.sh
 create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh
 create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh

diff --git a/tools/testing/selftests/sgx/ash_cgexec.sh  
b/tools/testing/selftests/sgx/ash_cgexec.sh
new file mode 100755
index 000000000000..cfa5d2b0e795
--- /dev/null
+++ b/tools/testing/selftests/sgx/ash_cgexec.sh
@@ -0,0 +1,16 @@
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2024 Intel Corporation.
+
+# Start a program in a given cgroup.
+# Supports V2 cgroup paths, relative to /sys/fs/cgroup
+if [ "$#" -lt 2 ]; then
+    echo "Usage: $0 <v2 cgroup path> <command> [args...]"
+    exit 1
+fi
+# Move this shell to the cgroup.
+echo 0 >/sys/fs/cgroup/$1/cgroup.procs
+shift
+# Execute the command within the cgroup
+exec "$@"
+
diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh  
b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
new file mode 100755
index 000000000000..dd56273056fc
--- /dev/null
+++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh
@@ -0,0 +1,275 @@
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2023, 2024 Intel Corporation.
+
+TEST_ROOT_CG=selftest
+TEST_CG_SUB1=$TEST_ROOT_CG/test1
+TEST_CG_SUB2=$TEST_ROOT_CG/test2
+# We will only set limit in test1 and run tests in test3
+TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3
+TEST_CG_SUB4=$TEST_ROOT_CG/test4
+
+# Cgroup v2 only
+CG_ROOT=/sys/fs/cgroup
+mkdir -p $CG_ROOT/$TEST_CG_SUB1
+mkdir -p $CG_ROOT/$TEST_CG_SUB2
+mkdir -p $CG_ROOT/$TEST_CG_SUB3
+mkdir -p $CG_ROOT/$TEST_CG_SUB4
+
+# Turn on misc and memory controller in non-leaf nodes
+echo "+misc" >  $CG_ROOT/cgroup.subtree_control && \
+echo "+memory" > $CG_ROOT/cgroup.subtree_control && \
+echo "+misc" >  $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
+echo "+memory" > $CG_ROOT/$TEST_ROOT_CG/cgroup.subtree_control && \
+echo "+misc" >  $CG_ROOT/$TEST_CG_SUB1/cgroup.subtree_control
+if [ $? -ne 0 ]; then
+    echo "# Failed setting up cgroups, make sure misc and memory  
cgroups are enabled."
+    exit 1
+fi
+
+CAPACITY=$(grep "sgx_epc" "$CG_ROOT/misc.capacity" | awk '{print $2}')
+# This is below number of VA pages needed for enclave of capacity  
size. So
+# should fail oversubscribed cases
+SMALL=$(( CAPACITY / 512 ))
+
+# At least load one enclave of capacity size successfully, maybe up to  
4.
+# But some may fail if we run more than 4 concurrent enclaves of  
capacity size.
+LARGE=$(( SMALL * 4 ))
+
+# Load lots of enclaves
+LARGER=$CAPACITY
+echo "# Setting up limits."
+echo "sgx_epc $SMALL" > $CG_ROOT/$TEST_CG_SUB1/misc.max && \
+echo "sgx_epc $LARGE" >  $CG_ROOT/$TEST_CG_SUB2/misc.max && \
+echo "sgx_epc $LARGER" > $CG_ROOT/$TEST_CG_SUB4/misc.max
+if [ $? -ne 0 ]; then
+    echo "# Failed setting up misc limits."
+    exit 1
+fi
+
+clean_up()
+{
+    sleep 2
+    rmdir $CG_ROOT/$TEST_CG_SUB2
+    rmdir $CG_ROOT/$TEST_CG_SUB3
+    rmdir $CG_ROOT/$TEST_CG_SUB4
+    rmdir $CG_ROOT/$TEST_CG_SUB1
+    rmdir $CG_ROOT/$TEST_ROOT_CG
+}
+
+timestamp=$(date +%Y%m%d_%H%M%S)
+
+test_cmd="./test_sgx -t unclobbered_vdso_oversubscribed"
+
+PROCESS_SUCCESS=1
+PROCESS_FAILURE=0
+
+# Wait for a process and check for expected exit status.
+#
+# Arguments:
+#	$1 - the pid of the process to wait and check.
+#	$2 - 1 if expecting success, 0 for failure.
+#
+# Return:
+#	0 if the exit status of the process matches the expectation.
+#	1 otherwise.
+wait_check_process_status() {
+    pid=$1
+    check_for_success=$2
+
+    wait "$pid"
+    status=$?
+
+    if [ $check_for_success -eq $PROCESS_SUCCESS ] && [ $status -eq 0  
]; then
+        echo "# Process $pid succeeded."
+        return 0
+    elif [ $check_for_success -eq $PROCESS_FAILURE ] && [ $status -ne  
0 ]; then
+        echo "# Process $pid returned failure."
+        return 0
+    fi
+    return 1
+}
+
+# Wait for a set of processes and check for expected exit status
+#
+# Arguments:
+#	$1 - 1 if expecting success, 0 for failure.
+#	remaining args - The pids of the processes
+#
+# Return:
+#	0 if exit status of any process matches the expectation.
+#	1 otherwise.
+wait_and_detect_for_any() {
+    check_for_success=$1
+
+    shift
+    detected=1 # 0 for success detection
+
+    for pid in $@; do
+        if wait_check_process_status "$pid" "$check_for_success"; then
+            detected=0
+            # Wait for other processes to exit
+        fi
+    done
+
+    return $detected
+}
+
+echo "# Start unclobbered_vdso_oversubscribed with SMALL limit,  
expecting failure..."
+# Always use leaf node of misc cgroups
+# these may fail on OOM
+./ash_cgexec.sh $TEST_CG_SUB3 $test_cmd >cgtest_small_$timestamp.log  
2>&1
+if [ $? -eq 0 ]; then
+    echo "# Fail on SMALL limit, not expecting any test passes."
+    clean_up
+    exit 1
+else
+    echo "# Test failed as expected."
+fi
+
+echo "# PASSED SMALL limit."
+
+echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with  
LARGE limit,
+        expecting at least one success...."
+
+pids=""
+for i in 1 2 3 4; do
+    (
+        ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd  
>cgtest_large_positive_$timestamp.$i.log 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+
+if wait_and_detect_for_any $PROCESS_SUCCESS "$pids"; then
+    echo "# PASSED LARGE limit positive testing."
+else
+    echo "# Failed on LARGE limit positive testing, no test passes."
+    clean_up
+    exit 1
+fi
+
+echo "# Start 5 concurrent unclobbered_vdso_oversubscribed tests with  
LARGE limit,
+        expecting at least one failure...."
+pids=""
+for i in 1 2 3 4 5; do
+    (
+        ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd  
>cgtest_large_negative_$timestamp.$i.log 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+if wait_and_detect_for_any $PROCESS_FAILURE "$pids"; then
+    echo "# PASSED LARGE limit negative testing."
+else
+    echo "# Failed on LARGE limit negative testing, no test fails."
+    clean_up
+    exit 1
+fi
+
+echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with  
LARGER limit,
+        expecting no failure...."
+pids=""
+for i in 1 2 3 4 5 6 7 8; do
+    (
+        ./ash_cgexec.sh $TEST_CG_SUB4 $test_cmd  
>cgtest_larger_$timestamp.$i.log 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+if wait_and_detect_for_any $PROCESS_FAILURE "$pids"; then
+    echo "# Failed on LARGER limit, at least one test fails."
+    clean_up
+    exit 1
+else
+    echo "# PASSED LARGER limit tests."
+fi
+
+echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with  
LARGER limit,
+      randomly kill one, expecting no failure...."
+pids=""
+for i in 1 2 3 4 5 6 7 8; do
+    (
+        ./ash_cgexec.sh $TEST_CG_SUB4 $test_cmd  
>cgtest_larger_kill_$timestamp.$i.log 2>&1
+    ) &
+    pids="$pids $!"
+done
+random_number=$(awk 'BEGIN{srand();print int(rand()*5)}')
+sleep $((random_number + 1))
+
+# Randomly select a process to kill
+# Make sure usage counter not leaked at the end.
+RANDOM_INDEX=$(awk 'BEGIN{srand();print int(rand()*8)}')
+counter=0
+for pid in $pids; do
+    if [ "$counter" -eq "$RANDOM_INDEX" ]; then
+        PID_TO_KILL=$pid
+        break
+    fi
+    counter=$((counter + 1))
+done
+
+kill $PID_TO_KILL
+echo "# Killed process with PID: $PID_TO_KILL"
+
+any_failure=0
+for pid in $pids; do
+    wait "$pid"
+    status=$?
+    if [ "$pid" != "$PID_TO_KILL" ]; then
+        if [ $status -ne 0 ]; then
+	    echo "# Process $pid returned failure."
+            any_failure=1
+        fi
+    fi
+done
+
+if [ $any_failure -ne 0 ]; then
+    echo "# Failed on random killing, at least one test fails."
+    clean_up
+    exit 1
+fi
+echo "# PASSED LARGER limit test with a process randomly killed."
+
+MEM_LIMIT_TOO_SMALL=$((CAPACITY - 2 * LARGE))
+
+echo "$MEM_LIMIT_TOO_SMALL" > $CG_ROOT/$TEST_CG_SUB2/memory.max
+if [ $? -ne 0 ]; then
+    echo "# Failed creating memory controller."
+    clean_up
+    exit 1
+fi
+
+echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with  
LARGE EPC limit,
+        and too small RAM limit, expecting all failures...."
+# Ensure swapping off so the OOM killer is activated when mem_cgroup  
limit is hit.
+swapoff -a
+pids=""
+for i in 1 2 3 4; do
+    (
+        ./ash_cgexec.sh $TEST_CG_SUB2 $test_cmd  
>cgtest_large_oom_$timestamp.$i.log 2>&1
+    ) &
+    pids="$pids $!"
+done
+
+if wait_and_detect_for_any $PROCESS_SUCCESS "$pids"; then
+    echo "# Failed on tests with memcontrol, some tests did not fail."
+    clean_up
+    swapon -a
+    exit 1
+else
+    swapon -a
+    echo "# PASSED LARGE limit tests with memcontrol."
+fi
+
+sleep 2
+
+USAGE=$(grep '^sgx_epc' "$CG_ROOT/$TEST_ROOT_CG/misc.current" | awk  
'{print $2}')
+if [ "$USAGE" -ne 0 ]; then
+    echo "# Failed: Final usage is $USAGE, not 0."
+else
+    echo "# PASSED leakage check."
+    echo "# PASSED ALL cgroup limit tests, cleanup cgroups..."
+fi
+clean_up
+echo "# done."
diff --git a/tools/testing/selftests/sgx/watch_misc_for_tests.sh  
b/tools/testing/selftests/sgx/watch_misc_for_tests.sh
new file mode 100755
index 000000000000..1c9985726ace
--- /dev/null
+++ b/tools/testing/selftests/sgx/watch_misc_for_tests.sh
@@ -0,0 +1,11 @@
+#!/usr/bin/env sh
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2023, 2024 Intel Corporation.
+
+if [ -z "$1" ]; then
+    echo "No argument supplied, please provide 'max', 'current', or  
'events'"
+    exit 1
+fi
+
+watch -n 1 'find /sys/fs/cgroup -wholename "*/test*/misc.'$1'" -exec \
+    sh -c '\''echo "$1:"; cat "$1"'\'' _ {} \;'

So this is what happens now:

jarkko@mustatorvisieni:~/linux-tpmdd> make -C  
tools/testing/selftests/sgx run_tests
make: Entering directory  
'/home/jarkko/linux-tpmdd/tools/testing/selftests/sgx'
TAP version 13
1..1
# timeout set to 45
# selftests: sgx: test_sgx
# TAP version 13
# 1..16
# # Starting 16 tests from 1 test cases.
# #  RUN           enclave.unclobbered_vdso ...
# #            OK  enclave.unclobbered_vdso
# ok 1 enclave.unclobbered_vdso
# #  RUN           enclave.unclobbered_vdso_oversubscribed ...
# #            OK  enclave.unclobbered_vdso_oversubscribed
# ok 2 enclave.unclobbered_vdso_oversubscribed
# #  RUN           enclave.unclobbered_vdso_oversubscribed_remove ...
# # main.c:402:unclobbered_vdso_oversubscribed_remove:Creating an  
enclave with 98566144 bytes heap may take a while ...
# # main.c:457:unclobbered_vdso_oversubscribed_remove:Changing type of  
98566144 bytes to trimmed may take a while ...
# # main.c:473:unclobbered_vdso_oversubscribed_remove:Entering enclave  
to run EACCEPT for each page of 98566144 bytes may take a while ...
# # main.c:494:unclobbered_vdso_oversubscribed_remove:Removing 98566144  
bytes from enclave may take a while ...
# #            OK  enclave.unclobbered_vdso_oversubscribed_remove
# ok 3 enclave.unclobbered_vdso_oversubscribed_remove
# #  RUN           enclave.clobbered_vdso ...
# #            OK  enclave.clobbered_vdso
# ok 4 enclave.clobbered_vdso
# #  RUN           enclave.clobbered_vdso_and_user_function ...
# #            OK  enclave.clobbered_vdso_and_user_function
# ok 5 enclave.clobbered_vdso_and_user_function
# #  RUN           enclave.tcs_entry ...
# #            OK  enclave.tcs_entry
# ok 6 enclave.tcs_entry
# #  RUN           enclave.pte_permissions ...
# #            OK  enclave.pte_permissions
# ok 7 enclave.pte_permissions
# #  RUN           enclave.tcs_permissions ...
# #            OK  enclave.tcs_permissions
# ok 8 enclave.tcs_permissions
# #  RUN           enclave.epcm_permissions ...
# #            OK  enclave.epcm_permissions
# ok 9 enclave.epcm_permissions
# #  RUN           enclave.augment ...
# #            OK  enclave.augment
# ok 10 enclave.augment
# #  RUN           enclave.augment_via_eaccept ...
# #            OK  enclave.augment_via_eaccept
# ok 11 enclave.augment_via_eaccept
# #  RUN           enclave.tcs_create ...
# #            OK  enclave.tcs_create
# ok 12 enclave.tcs_create
# #  RUN           enclave.remove_added_page_no_eaccept ...
# #            OK  enclave.remove_added_page_no_eaccept
# ok 13 enclave.remove_added_page_no_eaccept
# #  RUN           enclave.remove_added_page_invalid_access ...
# #            OK  enclave.remove_added_page_invalid_access
# ok 14 enclave.remove_added_page_invalid_access
# #  RUN            
enclave.remove_added_page_invalid_access_after_eaccept ...
# #            OK  enclave.remove_added_page_invalid_access_after_eaccept
# ok 15 enclave.remove_added_page_invalid_access_after_eaccept
# #  RUN           enclave.remove_untouched_page ...
# #            OK  enclave.remove_untouched_page
# ok 16 enclave.remove_untouched_page
# # PASSED: 16 / 16 tests passed.
# # Totals: pass:16 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 1 selftests: sgx: test_sgx

Also cgroups tests are expected to run when invoking "run_tests".

I can add the SGX cgroup tests to the "run_tests" target. But it will need  
more than the 45secs given by the default timeout especially on platforms  
with larger EPC. You will need run with --override-timeout
(current SGX selftests also timeout for default 45 sec on a server with 4G  
EPC).


I also wonder do we really want this:

diff --git a/config/x86_64/default b/config/x86_64/default
index 246b1f3df46..6c3d20af7b4 100644
--- a/config/x86_64/default
+++ b/config/x86_64/default
@@ -502,6 +502,7 @@ CONFIG_X86_INTEL_TSX_MODE_OFF=y
 # CONFIG_X86_INTEL_TSX_MODE_ON is not set
 # CONFIG_X86_INTEL_TSX_MODE_AUTO is not set
 CONFIG_X86_SGX=y
+CONFIG_CGROUP_SGX_EPC=y
 # CONFIG_X86_USER_SHADOW_STACK is not set
 CONFIG_EFI=y
 CONFIG_EFI_STUB=y

It is a small change but affects all of downstream, not just OpenSUSE.

I hard time projecting a situation where you wanted SGX but without
cgroups support so perhaps it would be a better idea to enable cgroups
unconditionally when SGX and cgroups are part of the kernel config?

Then downstream can just pick the patches and call it a day...

BR, Jarkko


I don't have issue to remove this config and conditionally compile in SGX  
cgroup implementation when MISC is configured.

Thanks
Haitao