Hello Claudio, Sorry -- I ran out of time in the last months. Now I return to tthis. On Tue, 12 Jun 2018 at 20:09, Claudio Scordino <claudio@xxxxxxxxxxxxxxx> wrote: > > Hi Michael, > > how do we move forward from this situation ? > > The kernel source [1] and the related documentation [2] have been > already updated to include both the SCHED_FLAG_RECLAIM and > SCHED_FLAG_DL_OVERRUN flags. > > Now, it's time to update the sched_setattr() manpage as well (even if > the SIGXCPU signal is per-process and not per-thread). > > Many thanks and best regards, So, I have captured pretty much everything of our discussion (starting from your initial patch) in manual page updates in the sched_setattr(2) page. The text currently since in a private branch in my local Git. The text also tries to capture what I beleive is the bug with SIGXCPU being process directed instead of thread-directed: SCHED_FLAG_DL_OVERRUN (since Linux 4.16) This flag allows an application to get informed about run-time overruns in SCHED_DEADLINE threads. Such overruns may be caused by (for example) coarse execution time accounting or incorrect parameter assignment. Notification takes the form of a SIGX‐ CPU signal which is generated on each overrun. This SIGXCPU signal is process-directed (see sig‐ nal(7)) rather than thread-directed. This is proba‐ bly a bug. On the one hand, sched_setattr() is being used to set a per-thread attribute. On the other hand, if the process-directed signal is deliv‐ ered to a thread inside the process other than the one that had a run-time overrun, the application has no way of knowing which thread overran. The question is: is there any plan to change this behavior, probably in the direction of Luca's idea to have a per-process signal that passes some info about which thread overran? Thanks, Michael > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/deadline.c > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/scheduler/sched-deadline.txt > > 2018-05-28 18:01 GMT+02:00 Juri Lelli <juri.lelli@xxxxxxxxxx>: > > Hi, > > > > On 02/05/18 11:21, luca abeni wrote: > >> Hi, > >> > >> On Mon, 30 Apr 2018 12:41:47 +0200 > >> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> wrote: > >> [...] > >> > >> 2. Possibly, I am misreading the code, but the SIGXCPU signal > >> > >> appears to be a process-directed signal, rather than a > >> > >> thread-directed signal? Am I correct, and if so, is that really > >> > >> the desired behavior? > >> > >> > >> > > > >> > > Actually, I can't remember and need to check the code. > >> > > >> > So, I went back and reviewed the kernel course. Indeed the signal > >> > seems to be process-directed: > >> > > >> > static inline void check_dl_overrun(struct task_struct *tsk) > >> > { > >> > if (tsk->dl.dl_overrun) { > >> > tsk->dl.dl_overrun = 0; > >> > __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk); > >> > } > >> > } > >> > > >> > __group_send_sig_info() sends a signal to a thread group. > >> > > >> > This smells buggy. In sched_setattr(), we are setting per-thread > >> > scheduling attributes. Surely, the signal should be thread-directed > >> > when an overrun occurs? Otherwise, how does the application know > >> > which thread overran? > >> > >> Sorry for jumping late in the discussion... > > > > And more sorry to be even more late to reply. :( > > > >> Anyway, I agree that there is a bug here (thanks for noticing!), and > >> the signal was originally designed (in my understanding) to be > >> thread-directed. > > > > Not sure. AFAIK, all threads of a group should be able (by default) to > > receive and handle the signal. It should however be the running task > > that receives and handles it first since check_thread_timers() is called > > from the timer interrupt and has the running task task_struct as > > parameter and complete_signal() (at the end of __send_signal()) checks > > if such task wants to handle the signal: > > > > https://elixir.bootlin.com/linux/v4.17-rc7/source/kernel/signal.c#L909 > > > > Does it make any sense?