On 18/07/2023 12:24, David Hildenbrand wrote: > On 18.07.23 13:23, David Hildenbrand wrote: >> On 18.07.23 12:49, Ryan Roberts wrote: >>> On 17/07/2023 18:40, David Hildenbrand wrote: >>>> On 17.07.23 12:31, Ryan Roberts wrote: >>>>> The `migration` test currently has a number of robustness problems that >>>>> cause it to hang and leak resources. >>>>> >>>>> Timeout: There are 3 tests, which each previously ran for 60 seconds. >>>>> However, the timeout in mm/settings for a single test binary was set to >>>>> 45 seconds. So when run using run_kselftest.sh, the top level timeout >>>>> would trigger before the test binary was finished. Solve this by meeting >>>>> in the middle; each of the 3 tests now runs for 20 seconds (for a total >>>>> of 60), and the top level timeout is set to 90 seconds. >>>>> >>>>> Leaking child processes: the `shared_anon` test fork()s some children >>>>> but then an ASSERT() fires before the test kills those children. The >>>>> assert causes immediate exit of the parent and leaking of the children. >>>>> Furthermore, if run using the run_kselftest.sh wrapper, the wrapper >>>>> would get stuck waiting for those children to exit, which never happens. >>>>> Solve this by deferring any asserts until after the children are killed. >>>>> The same pattern is used for the threaded tests for uniformity. >>>>> >>>>> With these changes, the test binary now runs to completion on arm64, >>>>> with 2 tests passing and the `shared_anon` test failing. >>>>> >>>>> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> >>>>> --- >>>>> tools/testing/selftests/mm/migration.c | 14 ++++++++++---- >>>>> tools/testing/selftests/mm/settings | 2 +- >>>>> 2 files changed, 11 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/tools/testing/selftests/mm/migration.c >>>>> b/tools/testing/selftests/mm/migration.c >>>>> index 379581567f27..189d7d9070e8 100644 >>>>> --- a/tools/testing/selftests/mm/migration.c >>>>> +++ b/tools/testing/selftests/mm/migration.c >>>>> @@ -15,7 +15,7 @@ >>>>> #include <time.h> >>>>> #define TWOMEG (2<<20) >>>>> -#define RUNTIME (60) >>>>> +#define RUNTIME (20) >>>>> #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) >>>>> @@ -118,6 +118,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >>>>> { >>>>> uint64_t *ptr; >>>>> int i; >>>>> + int ret; >>>>> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >>>>> SKIP(return, "Not enough threads or NUMA nodes available"); >>>>> @@ -131,9 +132,10 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) >>>>> if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) >>>>> perror("Couldn't create thread"); >>>>> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >>>>> + ret = migrate(ptr, self->n1, self->n2); >>>>> for (i = 0; i < self->nthreads - 1; i++) >>>>> ASSERT_EQ(pthread_cancel(self->threads[i]), 0); >>>>> + ASSERT_EQ(ret, 0); >>>> >>>> Why is that required? This does not involve fork. >>> >>> It's not required. I was just trying to keep everything aligned to the same >>> pattern. >>> >>>> >>>>> } >>>>> /* >>>>> @@ -144,6 +146,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >>>>> pid_t pid; >>>>> uint64_t *ptr; >>>>> int i; >>>>> + int ret; >>>>> if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) >>>>> SKIP(return, "Not enough threads or NUMA nodes available"); >>>>> @@ -161,9 +164,10 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) >>>>> self->pids[i] = pid; >>>>> } >>>>> - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); >>>>> + ret = migrate(ptr, self->n1, self->n2); >>>>> for (i = 0; i < self->nthreads - 1; i++) >>>>> ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); >>>>> + ASSERT_EQ(ret, 0); >>>> >>>> >>>> Might be cleaner to also: >>> >>> Or instead of? I agree this is neater, so will undo the moving of the ASSERT() >>> and rely on this prctl. >> >> I was thinking about possible races when our parent process already >> quits before our child managed to set the prctl. prctl() won't do >> anything in that case, hmmmm. >> >> But similarly, existing code might already trigger the migrate() + kill >> before the child processes even started to access_mem(). >> >> Racy :) >> > > Maybe what would work, is checking after the prctl() in the child if the parent > is already gone. Like this? if (!pid) { prctl(PR_SET_PDEATHSIG, SIGHUP); /* Parent may have died before prctl so check now. */ if (getppid() == 1) kill(getpid(), SIGHUP); access_mem(ptr); } >