Re: [PATCH] sched: Move task_mm_cid_work to mm delayed work

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Mon, 9 Dec 2024 10:48:35 -0500

On 2024-12-09 10:33, Mathieu Desnoyers wrote:
On 2024-12-09 08:45, Gabriele Monaco wrote:

Thinking back on this, you'll want a program that does the following
on a system with N CPUs:

- Phase 1: run one thread per cpu, pinned on each cpu. Print the
    mm_cid from each thread with the cpu number every second or so.

- Exit all threads except the main thread, join them from the main
    thread,

- Phase 2: the program is now single-threaded. We'd expect the
    mm_cid value to converge towards 0 as the periodic task clears
    unused CIDs.

So I think in phase 2 we can have an actual automated test: If after
an order of magnitude more time than the 100ms delay between periodic
tasks we still observe mm_cid > 0 in phase 2, then something is
wrong.

Been thinking about this and came up with a simple draft, I'll probably
send it as a separate patch.

Doing this can lead to false positives: the main thread may be assigned
the mm_cid 0 and keep it till the end, in this scenario the other
threads (CPUs) would get different mm_cids and exit, the main thread
will still have 0 and pass the test regardless.

I have an idea to make it a bit more robust: we can run threads as you
described in phase 1, stop all but one (let's say the one running on
the last core), make sure the main thread doesn't accidentally run on
the same core by pinning to core 0 and wait until we see the 2
remaining threads holding 0 and 1, in any order.
Besides a special case if we have only 1 available core, this should
work fine, sure we could get false positives but it seems to me much
less likely.

Does it make sense to you?

A small tweak on your proposed approach: in phase 1, get each thread
to publish which mm_cid they observe, and select one thread which
has observed mm_cid > 1 (possibly the largest mm_cid) as the thread
that will keep running in phase 2 (in addition to the main thread).

All threads other than the main thread and that selected thread exit
and are joined before phase 2.

So you end up in phase 2 with:

- main (observed any mm_cid)
- selected thread (observed mm_cid > 1, possibly largest)

Then after a while, the selected thread should observe a
mm_cid <= 1.

This test should be skipped if there are less than 3 CPUs in
allowed cpumask (sched_getaffinity).

Even better:

For a sched_getaffinity with N cpus:

- If N == 1 -> skip (we cannot validate anything)

Phase 1: create N - 1 pthreads, each pinned to a CPU. main thread
also pinned to a cpu.

Publish the mm_cids observed by each thread, including main thread.

Select a new leader for phase 2: a thread which has observed nonzero
mm_cid. Each other thread including possibly main thread issue
pthread_exit, and the new leader does pthread join on each other.

Then check that the new leader eventually observe mm_cid == 0.

And it works with an allowed cpu mask that has only 2 cpus.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com