Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

Deepa Dinamani <deepa.kernel@xxxxxxxxx> · Wed, 22 May 2019 09:33:50 -0700

On Wed, May 22, 2019 at 9:14 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> On 05/22, Deepa Dinamani wrote:
> >
> > -Deepa
> >
> > > On May 22, 2019, at 8:05 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> > >
> > >> On 05/21, Deepa Dinamani wrote:
> > >>
> > >> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
> > >> etc) only when there is no other error. If there is a signal and an error
> > >> like EINVAL, the syscalls return -EINVAL rather than the interrupted
> > >> error codes.
> > >
> > > Ugh. I need to re-check, but at first glance I really dislike this change.
> > >
> > > I think we can fix the problem _and_ simplify the code. Something like below.
> > > The patch is obviously incomplete, it changes only only one caller of
> > > set_user_sigmask(), epoll_pwait() to explain what I mean.
> > > restore_user_sigmask() should simply die. Although perhaps another helper
> > > makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending).
> >
> > restore_user_sigmask() was added because of all the variants of these
> > syscalls we added because of y2038 as noted in commit message:
> >
> >   signal: Add restore_user_sigmask()
> >
> >     Refactor the logic to restore the sigmask before the syscall
> >     returns into an api.
> >     This is useful for versions of syscalls that pass in the
> >     sigmask and expect the current->sigmask to be changed during
> >     the execution and restored after the execution of the syscall.
> >
> >     With the advent of new y2038 syscalls in the subsequent patches,
> >     we add two more new versions of the syscalls (for pselect, ppoll
> >     and io_pgetevents) in addition to the existing native and compat
> >     versions. Adding such an api reduces the logic that would need to
> >     be replicated otherwise.
>
> Again, I need to re-check, will continue tomorrow. But so far I am not sure
> this helper can actually help.
>
> > > --- a/fs/eventpoll.c
> > > +++ b/fs/eventpoll.c
> > > @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
> > >        size_t, sigsetsize)
> > > {
> > >    int error;
> > > -    sigset_t ksigmask, sigsaved;
> > >
> > >    /*
> > >     * If the caller wants a certain signal mask to be set during the wait,
> > >     * we apply it here.
> > >     */
> > > -    error = set_user_sigmask(sigmask, &ksigmask, &sigsaved, sigsetsize);
> > > +    error = set_user_sigmask(sigmask, sigsetsize);
> > >    if (error)
> > >        return error;
> > >
> > >    error = do_epoll_wait(epfd, events, maxevents, timeout);
> > >
> > > -    restore_user_sigmask(sigmask, &sigsaved);
> > > +    if (error != -EINTR)
> >
> > As you address all the other syscalls this condition becomes more and
> > more complicated.
>
> May be.
>
> > > --- a/include/linux/sched/signal.h
> > > +++ b/include/linux/sched/signal.h
> > > @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task);
> > > static inline void set_restore_sigmask(void)
> > > {
> > >    set_thread_flag(TIF_RESTORE_SIGMASK);
> > > -    WARN_ON(!test_thread_flag(TIF_SIGPENDING));
> >
> > So you always want do_signal() to be called?
>
> Why do you think so? No. This is just to avoid the warning, because with the
> patch I sent set_restore_sigmask() is called "in advance".
>
> > You will have to check each architecture's implementation of
> > do_signal() to check if that has any side effects.
>
> I don't think so.

Why not?

> > Although this is not what the patch is solving.
>
> Sure. But you know, after I tried to read the changelog, I am not sure
> I understand what exactly you are trying to fix. Could you please explain
> this part
>
>         The behavior
>         before 854a6ed56839a was that the signals were dropped after the error
>         code was decided. This resulted in lost signals but the userspace did not
>         notice it
>
> ? I fail to understand it, sorry. It looks as if the code was already buggy before
> that commit and it could miss a signal or something like this, but I do not see how.

Did you read the explanation pointed to in the commit text? :

https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

Let me know what part you don't understand and I can explain more.

It would be better to understand the isssue before we start discussing the fix.

-Deepa