On 05/24, Deepa Dinamani wrote: > > On Fri, May 24, 2019 at 7:11 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > > > On 05/23, Deepa Dinamani wrote: > > > > > > Ok, since there has been quite a bit of argument here, I will > > > backtrack a little bit and maybe it will help us understand what's > > > happening here. > > > There are many scenarios being discussed on this thread: > > > a. State of code before 854a6ed56839a > > > > I think everything was correct, > > There were 2 things that were wrong: > > 1. If an unblocked signal was received, after the ep_poll(), then the > return status did not indicate that. Yes, > This is expected behavior > according to man page. If this is indeed what is expected then the man > page should note that signal will be delivered in this case and return > code will still be 0. > > "EINTR > The call was interrupted by a signal handler before either any of the > requested events occurred or the timeout expired; see signal(7)." and what do you think the man page could say? This is obviously possible for any syscall, and we can't avoid this. A signal can come right after syscall insn completes. The signal handler will be called but this won't change $rax, user-space can see return code == 0 or anything else. And this doesn't differ from the case when the signal comes before syscall returns. > 2. The restoring of the sigmask is done right in the syscall part and > not while exiting the syscall and if you get a blocked signal here, > you will deliver this to userspace. So I assume that this time you are talking about epoll_pwait() and not epoll_wait()... And I simply can't understand you. But yes, if the original mask doesn't include the pending signal it will be delivered while the syscall can return success/timout or -EFAULT or anything. This is correct, see above. > > > b. State after 854a6ed56839a > > > > obviously buggy, > > Ok, then can you point out what specifically was wrong with > 854a6ed56839a? Cough. If nothing else the lost -EINTR? > And, not how it could be more simple? Well, I already sent the patch and after that I even showed you the code with the patch applied. See https://lore.kernel.org/lkml/20190523143340.GA23070@xxxxxxxxxx/ > > What you are saying looks very confusing to me, I will assume that you > > meant something like > > > > - a signal SIG_XXX was blocked before sys_epoll_pwait() was called > > > > - sys_epoll_pwait(sigmask) unblocks SIG_XXX according to sigmask > > > > - sys_epoll_pwait() calls do_epoll_wait() which returns success > > > > - SIG_XXX comes after that and it is "never noticed" > > > > Yes. Everything is correct. And see my reply to David, SIG_XXX can even > > come _before_ sys_epoll_pwait() was called. > > No, I'm talking about a signal that was not blocked. OK, see above. > > > So the question is does the userspace have to know about this signal > > > or not. > > > > If userspace needs to know about SIG_XXX it should not block it, that is all. > > What should be the return value if a signal is detected after a fd completed? Did you mean "if a signal is detected after a ready fd was already found" ? In this case the return value should report success. But I have already lost, this all looks irrelevant wrt to fix we need. > > > What [b] does is to move the signal check closer to the restoration of > > > the signal. > > > > FOR NO REASON, afaics (to simplify, lets forget the problem with the wrong > > return value you are trying to fix). > > As I already pointed out, the restoring of the sigmask is done during > the syscall and not while exiting the syscall and if you get a blocked > signal here, you will deliver this to userspace. > > > And even if there were ANY reason to do this, note that (with or without this > > fix) the signal_pending() check inside restore_user_sigmask() can NOT help, > > simply because SIG_XXX can come right after this check. > > This I pointed out already that we should probably make this sequence atomic. See above. Oleg.