Re: [PATCH] NLM: hold BKL when clearing global lockd task and serv vars

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 8 Apr 2008 16:20:26 -0400

On Tue, 8 Apr 2008 16:08:40 -0400
Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:

> On Apr 8, 2008, at 12:28 PM, J. Bruce Fields wrote:
> > On Tue, Apr 08, 2008 at 09:21:02AM -0400, Jeff Layton wrote:
> >> On Mon, 7 Apr 2008 16:50:27 -0400
> >> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> >>
> >>> On Mon, Apr 07, 2008 at 04:22:41PM -0400, Jeff Layton wrote:
> >>>> On Mon, 7 Apr 2008 13:56:15 -0400
> >>>> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> >>>>
> >>>>> On Mon, Apr 07, 2008 at 12:45:01PM -0400, Christoph Hellwig wrote:
> >>>>>> On Mon, Apr 07, 2008 at 09:38:34AM -0400, Jeff Layton wrote:
> >>>>>>> The global task and serv pointers for lockd are normally  
> >>>>>>> protected by
> >>>>>>> the nlmsvc_mutex. The exception is when the lockd exits  
> >>>>>>> abnormally. When
> >>>>>>> this occurs, these variables are cleared without any locking.
> >>>>>>
> >>>>>> Shouldn't we get rid of the case where it exits abnormally  
> >>>>>> instead?
> >>>>>
> >>>>> I tried to figure out when this could actually occur (when can
> >>>>> svc_recv() return an error other than -EINTR or -EAGAIN?), and  
> >>>>> got lost
> >>>>> in sock_recvmsg():
> >>>>>
> >>>>> 	- svc_recv() itself returns only -EAGAIN or the return from
> >>>>> 	  ->xpo_recvfrom().
> >>>>> 	- the only xpo_recvfrom() that's interesting is
> >>>>> 	  svc_tcp_recvfrom(), which can return the error it gets from
> >>>>> 	  svc_recvfrom(), which can return the error from
> >>>>> 	  kernel_recvmsg(), which gets its return from sock_recvmsg().
> >>>>>
> >>>>> Since __sock_recvmsg() has a security hook, it looks like we  
> >>>>> can end up
> >>>>> with an -EACCES from selinux?
> >>>>>
> >>>>> So one case would be selinux deciding we weren't allowed to  
> >>>>> receive
> >>>>> packets from this socket.  Huh.
> >>>>
> >>>> I got lost there too, but I would suspect that there are other  
> >>>> errors
> >>>> that can bubble up from the lower networking layers as well.  
> >>>> Even if
> >>>> there aren't currently, it's probably still prudent to assume  
> >>>> that it's
> >>>> a possibility and code for it.
> >>>>
> >>>> I tend to think the safest thing is probably to do a long sleep  
> >>>> (1s or
> >>>> so and retry when we get an error (maybe also a ratelimited  
> >>>> printk?).
> >>>
> >>> Yeah, I guess I can't think of anything better.
> >>>
> >>
> >> Ok, I went ahead and did patches for this and gave them a quick test
> >> this morning. Obviously, these are hard to fully unit test since this
> >> seems to be a very uncommon occurrence.
> >
> > I suppose this could probably be reproduced with some selinux magic.
> >
> >> Any thoughts?
> >
> > If anyone does ever hit this and it doesn't go away, the printk (even
> > with the ratelimiting) could be pretty annoying, so it might be worth
> > arranging to print this just once.  But perhaps we can wait and see if
> > that actually happens.
> 
> 
> Coding by contract would be useful here.
> 
> If svc_recv() returns only specific error codes (and never anything  
> else) then its callers don't need any special logic for  
> "unrecognized" return codes.
> 
> Thus, ensuring that svc_recv() returns only EINTR, EACCES, or EAGAIN  
> would limit the complexity and failure modes of its callers.
> 

I don't think that gains us much. The errors of interest are basically
EINTR, EAGAIN and "everything else". If we have to deal with EACCES
anyway, then it's not really too much more complexity to just deal with
"everything else" at this level.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html