On Thu, 4 Apr 2024 at 15:45, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > Perhaps I am totally confused, but. > > On 04/04, Dmitry Vyukov wrote: > > > > On Wed, 3 Apr 2024 at 17:43, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > > > > > > Why distribution_thread() can't simply exit if got_signal != 0 ? > > > > > > > > See https://lore.kernel.org/all/20230128195641.GA14906@xxxxxxxxxx/ > > > > > > Indeed. It's too obvious :) > > > > This test models the intended use-case that was the motivation for the change: > > We want to sample execution of a running multi-threaded program, it > > has multiple active threads (that don't exit), since all threads are > > running and consuming CPU, > > Yes, > > > they all should get a signal eventually. > > Well, yes and no. > > No, in a sense that the motivation was not to ensure that all threads > get a signal, the motivation was to ensure that cpu_timer_fire() paths > will use the current task as the default target for signal_wake_up/etc. > This is just optimization. > > But yes, all should get a signal eventually. And this will happen with > or without the commit bcb7ee79029dca ("posix-timers: Prefer delivery of > signals to the current thread"). Any thread can dequeue a shared signal, > say, on return from interrupt. > > Just without that commit this "eventually" means A_LOT_OF_TIME statistically. I agree that any thread can pick the signal, but this A_LOT_OF_TIME makes it impossible for the test to reliably repeatedly pass w/o the change in any reasonable testing system. With the change the test was finishing/passing for me immediately all the time. Again, if the test causes practical problems (flaky), then I don't mind relaxing it (flaky tests suck). I was just against giving up on testing proactively just in case. > > If threads will exit once they get a signal, > > just in case, the main thread should not exit ... > > > then the test will pass > > even if signal delivery is biased towards a single running thread all > > the time (the previous kernel impl). > > See above. > > But yes, I agree, if thread exits once it get a signal, then A_LOT_OF_TIME > will be significantly decreased. But again, this is just statistical issue, > I do not see how can we test the commit bcb7ee79029dca reliably. > > OTOH. If the threads do not exit after they get signal, then _in theory_ > nothing can guarantee that this test-case will ever complete even with > that commit. It is possible that one of the threads will "never" have a > chance to run cpu_timer_fire(). > > In short, I leave this to you and Thomas. I have no idea how to write a > "good" test for that commit. > > Well... perhaps the main thread should just sleep in pause(), and > distribution_handler() should check that gettid() != getpid() ? > Something like this maybe... We need to ensure that the main thread > enters pause before timer_settime(). > > Oleg. >