Re: [PATCH] sched: Move task_mm_cid_work to mm delayed work

Gabriele Monaco <gmonaco@xxxxxxxxxx> · Mon, 09 Dec 2024 14:45:54 +0100

> Thinking back on this, you'll want a program that does the following
> on a system with N CPUs:
> 
> - Phase 1: run one thread per cpu, pinned on each cpu. Print the
>    mm_cid from each thread with the cpu number every second or so.
> 
> - Exit all threads except the main thread, join them from the main
>    thread,
> 
> - Phase 2: the program is now single-threaded. We'd expect the
>    mm_cid value to converge towards 0 as the periodic task clears
>    unused CIDs.
> 
> So I think in phase 2 we can have an actual automated test: If after
> an order of magnitude more time than the 100ms delay between periodic
> tasks we still observe mm_cid > 0 in phase 2, then something is
> wrong.

Been thinking about this and came up with a simple draft, I'll probably
send it as a separate patch.

Doing this can lead to false positives: the main thread may be assigned
the mm_cid 0 and keep it till the end, in this scenario the other
threads (CPUs) would get different mm_cids and exit, the main thread
will still have 0 and pass the test regardless.

I have an idea to make it a bit more robust: we can run threads as you
described in phase 1, stop all but one (let's say the one running on
the last core), make sure the main thread doesn't accidentally run on
the same core by pinning to core 0 and wait until we see the 2
remaining threads holding 0 and 1, in any order.
Besides a special case if we have only 1 available core, this should
work fine, sure we could get false positives but it seems to me much
less likely.

Does it make sense to you?

Gabriele