Re: [PATCH 1/1] sched_setattr: add new flags recently introduced

"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> · Mon, 30 Apr 2018 12:41:47 +0200

Hello Claudio,

On 04/27/2018 03:31 PM, Claudio Scordino wrote:
> Hi Michael,
> 
> 2018-04-27 15:10 GMT+02:00 Michael Kerrisk (man-pages) <
> mtk.manpages@xxxxxxxxx>:
> 
>> Hello Claudio,
>>
>> Thanks for your patch!
>>
>> On 04/18/2018 11:39 AM, Claudio Scordino wrote:
>>> This patch documents two additional flags recently introduced for the
>>> sched_flags field of sched_setattr().
>>> These flags are namely SCHED_FLAG_RECLAIM [1] and SCHED_FLAG_DL_OVERRUN
>> [2].
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
>> linux.git/commit/?id=ccc9d651a7d26fec88d2375c4dd4e097cc0f8de7
>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
>> linux.git/commit/?id=34be39305a77b8b1ec9f279163c7cdb6cc719b91
>>>
>>> Signed-off-by: Claudio Scordino <claudio@xxxxxxxxxxxxxxx>
>>> ---
>>>  man2/sched_setattr.2 | 20 ++++++++++++++++----
>>>  1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/man2/sched_setattr.2 b/man2/sched_setattr.2
>>> index 492ccbc..54bf6e8 100644
>>> --- a/man2/sched_setattr.2
>>> +++ b/man2/sched_setattr.2
>>> @@ -150,15 +150,27 @@ This field specifies the scheduling policy, as one
>> of the
>>>  values listed above.
>>>  .TP
>>>  .I sched_flags
>>> -This field contains flags controlling scheduling behavior.
>>> -Only one such flag is currently defined:
>>> -.BR SCHED_FLAG_RESET_ON_FORK .
>>> -As a result of including this flag, children created by
>>> +This field contains flags controlling the scheduling behavior.
>>> +.IP
>>> +.BR SCHED_FLAG_RESET_ON_FORK :
>>> +children created by
>>>  .BR fork (2)
>>>  do not inherit privileged scheduling policies.
>>>  See
>>>  .BR sched (7)
>>>  for details.
>>> +.IP
>>> +.BR SCHED_FLAG_RECLAIM :
>>> +allows a
>>> +.BR SCHED_DEADLINE
>>> +task to reclaim bandwidth unused by other real-time tasks through the
>> GRUB
>>> +algorithm. Available since kernel 4.13.
>>
>> Of course, the above leaves the reader asking "what is the GRUB algorithm"?
>> What should we say about this? Or should the text somehow be worded more
>> specifically to avoid mentioning a specific algorith?
>>
> 
> I think you can just drop the final part of the sentence (i.e., "through
> the GRUB algorithm").
> BTW, GRUB is the specific scheduling algorithm implemented since kernel
> 4.13.
> It is already well documented in the Documentation files of the kernel
> (i.e.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/scheduler/sched-deadline.txt)
> but we probably don't want to expose such level of detail in the man page.

Agreed. I removed mention of the GRUB algoritm.

>>> +.IP
>>> +.BR SCHED_FLAG_DL_OVERRUN :
>>> +allows a
>>> +.BR SCHED_DEADLINE
>>> +task to get informed about runtime overruns through the delivery of
>> SIGXCPU
>>> +signals. Available since kernel 4.16.
>>
>> I've reworked this piece as:
>>
>>               SCHED_FLAG_DL_OVERRUN (since Linux 4.16)
>>                      This flag allows a SCHED_DEADLINE  to  get  informed
>>
> 
> You've missed task/thread/process after SCHED_DEADLINE (see below).
> 
> 
>>                      about run-time overruns, which may be caused by (for
>>                      example) coarse execution time accounting or  incor‐
>>                      rect  parameter  assignment.  Notification takes the
>>                      form of a SIGXCPU signal which is generated on  each
>>                      overrun.
>>
> 
> I think it is fine. Technically, whenever the task tries to consume more
> CPU runtime than guaranteed, it gets throttled and informed through such
> signal.
> There could be multiple reasons for such behavior: wrong parameters
> assigned by the designer (or just wrong estimations), coarse accounting,
> but also bugs, etc.
> Note that a sporadic overrun might also be a desired behavior: an
> application with occasional workload fluctuations could set minimal
> requirements for most of the cases and then get informed.
> Otherwise, an application could even use such signal for adapting its CPU
> requirements (e.g. a codec dynamically changing its quality based on such
> signal).
> 
>> But, I have some questions.
>>
>> 1. The text on coarse execution and incorrect parameter assignment
>>    comes from the mail message that was referred to by
>>    commit 34be39305a77b8b1ec9f279163c7cdb6cc719b91:
>>
>>         https://lkml.org/lkml/2009/10/16/170
>>
>>    Is that added text in the man page okay?
>>
> 
> I think it is OK, even if there could be other  reasons other than these
> two.
> 
> 
>>
>> 2. Possibly, I am misreading the code, but the SIGXCPU signal
>>    appears to be a process-directed signal, rather than a
>>    thread-directed signal? Am I correct, and if so, is that really
>>    the desired behavior?
>>
> 
> Actually, I can't remember and need to check the code.

So, I went back and reviewed the kernel course. Indeed the signal seems
to be process-directed:

static inline void check_dl_overrun(struct task_struct *tsk)
{
        if (tsk->dl.dl_overrun) {
                tsk->dl.dl_overrun = 0;
                __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
        }
}

__group_send_sig_info() sends a signal to a thread group. 

This smells buggy. In sched_setattr(), we are setting per-thread scheduling
attributes. Surely, the signal should be thread-directed when an overrun 
occurs? Otherwise, how does the application know which thread overran?

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html