On Wed, May 22, 2019 at 9:14 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > On 05/22, Deepa Dinamani wrote: > > > > -Deepa > > > > > On May 22, 2019, at 8:05 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > > > > >> On 05/21, Deepa Dinamani wrote: > > >> > > >> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND, > > >> etc) only when there is no other error. If there is a signal and an error > > >> like EINVAL, the syscalls return -EINVAL rather than the interrupted > > >> error codes. > > > > > > Ugh. I need to re-check, but at first glance I really dislike this change. > > > > > > I think we can fix the problem _and_ simplify the code. Something like below. > > > The patch is obviously incomplete, it changes only only one caller of > > > set_user_sigmask(), epoll_pwait() to explain what I mean. > > > restore_user_sigmask() should simply die. Although perhaps another helper > > > makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending). > > > > restore_user_sigmask() was added because of all the variants of these > > syscalls we added because of y2038 as noted in commit message: > > > > signal: Add restore_user_sigmask() > > > > Refactor the logic to restore the sigmask before the syscall > > returns into an api. > > This is useful for versions of syscalls that pass in the > > sigmask and expect the current->sigmask to be changed during > > the execution and restored after the execution of the syscall. > > > > With the advent of new y2038 syscalls in the subsequent patches, > > we add two more new versions of the syscalls (for pselect, ppoll > > and io_pgetevents) in addition to the existing native and compat > > versions. Adding such an api reduces the logic that would need to > > be replicated otherwise. > > Again, I need to re-check, will continue tomorrow. But so far I am not sure > this helper can actually help. > > > > --- a/fs/eventpoll.c > > > +++ b/fs/eventpoll.c > > > @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, > > > size_t, sigsetsize) > > > { > > > int error; > > > - sigset_t ksigmask, sigsaved; > > > > > > /* > > > * If the caller wants a certain signal mask to be set during the wait, > > > * we apply it here. > > > */ > > > - error = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize); > > > + error = set_user_sigmask(sigmask, sigsetsize); > > > if (error) > > > return error; > > > > > > error = do_epoll_wait(epfd, events, maxevents, timeout); > > > > > > - restore_user_sigmask(sigmask, &sigsaved); > > > + if (error != -EINTR) > > > > As you address all the other syscalls this condition becomes more and > > more complicated. > > May be. > > > > --- a/include/linux/sched/signal.h > > > +++ b/include/linux/sched/signal.h > > > @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task); > > > static inline void set_restore_sigmask(void) > > > { > > > set_thread_flag(TIF_RESTORE_SIGMASK); > > > - WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > > > > So you always want do_signal() to be called? > > Why do you think so? No. This is just to avoid the warning, because with the > patch I sent set_restore_sigmask() is called "in advance". > > > You will have to check each architecture's implementation of > > do_signal() to check if that has any side effects. > > I don't think so. Why not? > > Although this is not what the patch is solving. > > Sure. But you know, after I tried to read the changelog, I am not sure > I understand what exactly you are trying to fix. Could you please explain > this part > > The behavior > before 854a6ed56839a was that the signals were dropped after the error > code was decided. This resulted in lost signals but the userspace did not > notice it > > ? I fail to understand it, sorry. It looks as if the code was already buggy before > that commit and it could miss a signal or something like this, but I do not see how. Did you read the explanation pointed to in the commit text? : https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/ Let me know what part you don't understand and I can explain more. It would be better to understand the isssue before we start discussing the fix. -Deepa