Re: [PATCH 1/1] sched_setattr: add new flags recently introduced

Claudio Scordino <claudio@xxxxxxxxxxxxxxx> · Wed, 2 May 2018 11:39:53 +0200

Hi Michael,

Il 30/04/2018 12:41, Michael Kerrisk (man-pages) ha scritto:
Hello Claudio,

On 04/27/2018 03:31 PM, Claudio Scordino wrote:
Hi Michael,

2018-04-27 15:10 GMT+02:00 Michael Kerrisk (man-pages) <
mtk.manpages@xxxxxxxxx>:

Hello Claudio,

Thanks for your patch!

On 04/18/2018 11:39 AM, Claudio Scordino wrote:
This patch documents two additional flags recently introduced for the
sched_flags field of sched_setattr().
These flags are namely SCHED_FLAG_RECLAIM [1] and SCHED_FLAG_DL_OVERRUN
[2].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=ccc9d651a7d26fec88d2375c4dd4e097cc0f8de7
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=34be39305a77b8b1ec9f279163c7cdb6cc719b91

Signed-off-by: Claudio Scordino <claudio@xxxxxxxxxxxxxxx>
---
  man2/sched_setattr.2 | 20 ++++++++++++++++----
  1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/man2/sched_setattr.2 b/man2/sched_setattr.2
index 492ccbc..54bf6e8 100644
--- a/man2/sched_setattr.2
+++ b/man2/sched_setattr.2
@@ -150,15 +150,27 @@ This field specifies the scheduling policy, as one
of the
  values listed above.
  .TP
  .I sched_flags
-This field contains flags controlling scheduling behavior.
-Only one such flag is currently defined:
-.BR SCHED_FLAG_RESET_ON_FORK .
-As a result of including this flag, children created by
+This field contains flags controlling the scheduling behavior.
+.IP
+.BR SCHED_FLAG_RESET_ON_FORK :
+children created by
  .BR fork (2)
  do not inherit privileged scheduling policies.
  See
  .BR sched (7)
  for details.
+.IP
+.BR SCHED_FLAG_RECLAIM :
+allows a
+.BR SCHED_DEADLINE
+task to reclaim bandwidth unused by other real-time tasks through the
GRUB
+algorithm. Available since kernel 4.13.

Of course, the above leaves the reader asking "what is the GRUB algorithm"?
What should we say about this? Or should the text somehow be worded more
specifically to avoid mentioning a specific algorith?


I think you can just drop the final part of the sentence (i.e., "through
the GRUB algorithm").
BTW, GRUB is the specific scheduling algorithm implemented since kernel
4.13.
It is already well documented in the Documentation files of the kernel
(i.e.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/scheduler/sched-deadline.txt)
but we probably don't want to expose such level of detail in the man page.

Agreed. I removed mention of the GRUB algoritm.

Good.


+.IP
+.BR SCHED_FLAG_DL_OVERRUN :
+allows a
+.BR SCHED_DEADLINE
+task to get informed about runtime overruns through the delivery of
SIGXCPU
+signals. Available since kernel 4.16.

I've reworked this piece as:

               SCHED_FLAG_DL_OVERRUN (since Linux 4.16)
                      This flag allows a SCHED_DEADLINE  to  get  informed


You've missed task/thread/process after SCHED_DEADLINE (see below).


                      about run-time overruns, which may be caused by (for
                      example) coarse execution time accounting or  incor‐
                      rect  parameter  assignment.  Notification takes the
                      form of a SIGXCPU signal which is generated on  each
                      overrun.


I think it is fine. Technically, whenever the task tries to consume more
CPU runtime than guaranteed, it gets throttled and informed through such
signal.
There could be multiple reasons for such behavior: wrong parameters
assigned by the designer (or just wrong estimations), coarse accounting,
but also bugs, etc.
Note that a sporadic overrun might also be a desired behavior: an
application with occasional workload fluctuations could set minimal
requirements for most of the cases and then get informed.
Otherwise, an application could even use such signal for adapting its CPU
requirements (e.g. a codec dynamically changing its quality based on such
signal).

But, I have some questions.

1. The text on coarse execution and incorrect parameter assignment
    comes from the mail message that was referred to by
    commit 34be39305a77b8b1ec9f279163c7cdb6cc719b91:

         https://lkml.org/lkml/2009/10/16/170

    Is that added text in the man page okay?


I think it is OK, even if there could be other  reasons other than these
two.



2. Possibly, I am misreading the code, but the SIGXCPU signal
    appears to be a process-directed signal, rather than a
    thread-directed signal? Am I correct, and if so, is that really
    the desired behavior?


Actually, I can't remember and need to check the code.

So, I went back and reviewed the kernel course. Indeed the signal seems
to be process-directed:

static inline void check_dl_overrun(struct task_struct *tsk)
{
         if (tsk->dl.dl_overrun) {
                 tsk->dl.dl_overrun = 0;
                 __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
         }
}

__group_send_sig_info() sends a signal to a thread group.

This smells buggy. In sched_setattr(), we are setting per-thread scheduling
attributes. Surely, the signal should be thread-directed when an overrun
occurs? Otherwise, how does the application know which thread overran?

This patch was a porting of an older patch. Unfortunately, nobody spotted such
inconsistency during the review on LKML.

If I've understood correctly, you're saying that we should have used
specific_send_sig_info() rather than __group_send_sig_info(), plus removing the
check inside check_process_timers().
Unfortunately, specific_send_sig_info() is currently defined as static.
So, we should either remove the static statement or rely on do_send_sig_info()
(unsure, since it performs much more work).

Otherwise, as Luca has just proposed, we could have a per-process signal
(much more useful, indeed) with some information about the thread who overrun.

I think we should discuss this on LKML. Do you mind triggering such discussion ?

Many thanks and best regards,

                   Claudio
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html