Re: [PATCH RESEND 2 1/1] sched/syscalls: Allow setting niceness using sched_param struct

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 13 Nov 2024 00:13:13 +0000
Michael Pratt <mcpratt@xxxxx> wrote:

> >
> > Why is stable Cc'd?
> >  
> 
> I believe this should be backported, if accepted,
> so that the behavior between kernel versions is matching.

That's not the purpose of stable. In fact, I would argue that it's the
opposite of what stable is for. A stable kernel should *not* change
behavior as that can cause regressions. If you want the newest behavior,
then you should use the newest kernels.


> I can do:
> 
>   $ cat /proc/$$/sched
> 
> and see the 120 without needing interpretation
> due to it being represented in a different way.

True it is exposed via files, but wouldn't this be the first change to make
it visible via a system call?

> 
> 
> > That said, you are worried about the race of spawning a new task and
> > setting its nice value because the new task may have exited. What about
> > using pidfd? Create a task returning the pidfd and use that to set its nice
> > value.  
> 
> I read a little about pidfd, but I'm not seeing the exact connection here,
> perhaps it will reduce the race condition but it cannot eliminate it as far as I see.
> For example, I am not finding a function that uses it to adjust niceness.

We can always add a system call do to that ;-)  In fact, there's a lot of
system calls that need to be converted to use pidfd over pid.

> 
> It's not that the "exit before modify" race condition is the only concern,
> it's just one of the less obvious factors making up my rationale for this change.
> I'm also concerned with efficiency. Why do we need to call another syscall
> if the syscall we are already in can handle it?
> 
> Personally, I find it strange that in sched_setscheduler()
> the policy can be changed but not the priority,
> when there is a standardized function dedicated to just that.

My concern is the man page that has (in Debian):

 $ man sched_setscheduler
[..]
       Currently, Linux supports the following "normal" (i.e., non-real-time) scheduling policies as values that may be specified in policy:

       SCHED_OTHER   the standard round-robin time-sharing policy;

       SCHED_BATCH   for "batch" style execution of processes; and

       SCHED_IDLE    for running very low priority background jobs.

       For each of the above policies, param->sched_priority must be 0.

Where we already document that the sched_priority "must be 0".

> 
> The difference between RT and normal processes
> is simply that for normal processes, we use "niceness" instead,
> so this patch simply translates the priority to "niceness",
> which cannot be expressed separately with the relevant POSIX functions.

I agree that POSIX has never been that great, but instead of modifying an
existing documented system call to do something that is documented not to
do, I believe we should either use other existing system calls or make a
new one.

-- Steve




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux