On Thu, 19 Jun 2008 12:29:16 +1000 Neil Brown <neilb@xxxxxxx> wrote: > On Wednesday June 18, jlayton@xxxxxxxxxx wrote: > > > > No objection to the patch, but what signal was being sent to nfsd when > > you saw this? If it's anything but a SIGKILL, then I wonder if we have > > a race that we need to deal with. My understanding is that we have nfsd > > flip between 2 sigmasks to prevent anything but a SIGKILL from being > > delivered while we're handling the local filesystem operation. > > SuSE /etc/init.d/nfsserver does > > killproc -n -KILL nfsd > > so it looks like a SIGKILL. > > > > > > From nfsd(): > > > > ----------[snip]----------- > > sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); > > > > /* > > * Find a socket with data available and call its > > * recvfrom routine. > > */ > > while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN) > > ; > > if (err < 0) > > break; > > update_thread_usage(atomic_read(&nfsd_busy)); > > atomic_inc(&nfsd_busy); > > > > /* Lock the export hash tables for reading. */ > > exp_readlock(); > > > > /* Process request with signals blocked. */ > > sigprocmask(SIG_SETMASK, &allowed_mask, NULL); > > > > svc_process(rqstp); > > > > ----------[snip]----------- > > > > What happens if this catches a SIGINT after the err<0 check, but before > > the mask is set to allowed_mask? Does svc_process() then get called with > > a signal pending? > > Yes, I suspect it does. > > I wonder why we have all this mucking about this signal masks anyway. > Anyone have any ideas about what it actually achieves? > HCH asked me the same question when I did the conversion to kthreads. My interpretation (based on guesswork here) was that we wanted to distinguish between SIGKILL and other allowed signals. A SIGKILL is allowed to interrupt the underlying I/O, but other signals should not. The question to answer here, I suppose, is whether masking a pending signal is sufficient to make signal_pending() return false. If I'm looking correctly then the answer should be "yes". So I don't think we have a race here after all. I suspect that if SuSE used a different signal here, that would prevent this from happening. For the record, both RHEL and Fedora's init scripts use SIGINT for this. -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html