Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Thu, 04 Apr 2024 17:10:18 +0200

On Thu, Apr 04 2024 at 15:43, Oleg Nesterov wrote:
> On 04/04, Dmitry Vyukov wrote:
>> they all should get a signal eventually.
>
> Well, yes and no.
>
> No, in a sense that the motivation was not to ensure that all threads
> get a signal, the motivation was to ensure that cpu_timer_fire() paths
> will use the current task as the default target for signal_wake_up/etc.
> This is just optimization.
>
> But yes, all should get a signal eventually. And this will happen with
> or without the commit bcb7ee79029dca ("posix-timers: Prefer delivery of
> signals to the current thread"). Any thread can dequeue a shared signal,
> say, on return from interrupt.
>
> Just without that commit this "eventually" means A_LOT_OF_TIME
> statistically.

bcb7ee79029dca only directs the wakeup to current, but the signal is
still queued in the process wide shared pending list. So the thread
which sees sigpending() first will grab and deliver it to itself.

> But yes, I agree, if thread exits once it get a signal, then A_LOT_OF_TIME
> will be significantly decreased. But again, this is just statistical issue,
> I do not see how can we test the commit bcb7ee79029dca reliably.

We can't.

What we can actually test is the avoidance of waking up the main thread
by doing the following in the main thread:

     start_threads();
     barrier_wait();
     nanosleep(2 seconds);
     done = 1;
     stop_threads();

and in the first thread which is started:

first_thread()
     barrier_wait();
     start_timer();
     loop()

On a pre 6.3 kernel nanosleep() will return early because the main
thread is woken up and will eventually win the race to deliver the
signal.

On a 6.3 and later kernel nanosleep() will not return early because the
main thread is not woken up as the wake up is directed at current,
i.e. a worker thread, which is running anyway and will consume the
signal.

> OTOH. If the threads do not exit after they get signal, then _in theory_
> nothing can guarantee that this test-case will ever complete even with
> that commit. It is possible that one of the threads will "never" have a
> chance to run cpu_timer_fire().

Even with the exit I managed to make one out of 100 runs run into the
timeout because the main thread always won the race.

> In short, I leave this to you and Thomas. I have no idea how to write a
> "good" test for that commit.
>
> Well... perhaps the main thread should just sleep in pause(), and
> distribution_handler() should check that gettid() != getpid() ?
> Something like this maybe... We need to ensure that the main thread
> enters pause before timer_settime().

I'm testing a modification which implements something like the above and
the success condition is that the main thread does not return early from
nanosleep() and has no signal accounted. It survived 2000 iterations by
now.

Let me polish it up.

Thanks,

        tglx