Thanks
Richard
On 4/7/06, Nish Aravamudan <nish.aravamudan@xxxxxxxxx> wrote:
On 4/7/06, Richard <judicator3@xxxxxxxxx> wrote:
> I have a general question about how the kernel wakes up a blocking
> system call when a signal is received. For example blocking sleep or
> select. According to the man pages, sleep should deschedule the process
> for a certain amount of time. The process could also be prematurely
> woken up by a signal and in that case, the kernel returns the value of
> EINTR. I have looked at the kernel source code and it seems that signals
> could be somewhat lost. That is the user does sleep() or select(), a
> signal is received but yet the signal is lost and the process remains
> descheduled. In the case of sleep, the process would sleep for the
> amount of time and select could possibly be blocked forever.
>
> The kernel source code for sys_nanosleep is very simple, it is found in
> the file kernel/timer.c and involves two short functions sys_nanosleep
> () and schedule_timeout(). The simplified pseudocode is as follows:
>
> 1 sys_nanosleep()
> 2 {
> 3 Get user arguments
> 4 Calculate time in future when the process has to wakeup
> 5 set state of process to TASK_INTERRUPTIBLE
> 6 set timer
> 7 schedule()
> 8
> 8 if (woken up by timer)
> 9 return 0
> 10 else
> 11 return EINTR
> 12 }
>
> When process A sends a signal to process B. The following happens if the
> signal is not ignored:
> - Process A sets the SIGPENDING flag of process B
> - Process A calls wake_up_process() to set the state of process to
> TASK_RUNNING.
>
> So in normal case, process B wakes up and starts executing on line 8
> above. sys_nanosleep() returns to user-space. When returning to user-
> space the code in entry.S gets executed which checks for SIGPENDING flag
> and possibly calls the appropriate signal handler in user-space.
>
> The issue I have is that a signal could be lost for sys_nanosleep if
> wake_up_process() is called before line 5.
I'm not a schedule expert, but I think schedule() takes care of this case.
If we get a signal, I believe signal_pending(current) will be true
(TIF_SIGPENDING is set, as you mention).
In schedule(), which is called by schedule_timeout() (and
schedule_hrtimer() in more recent kernels), we have
if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
unlikely(signal_pending(prev))))
prev->state = TASK_RUNNING;
Which (I think) says if we are now in TASK_INTERRUPTIBLE and prev has
any signals pending, then we set prev's state back to RUNNING. I
believe this means we won't sleep at all for this task (but we may not
run again right away, I'm not sure). Whenever this task gets run
again, though, it will return directly to schedule_timeout() and
return back to sys_nanosleep(), where we'll return
ERESTART_RESTARTBLOCK, if there's any time leftover to sleep for, else
0.
Does that seem right?
Thanks,
Nish