On Fri, 2010-10-22 at 11:56 -0400, Chuck Lever wrote: > On Oct 21, 2010, at 3:38 PM, Trond Myklebust wrote: > > > On Thu, 2010-10-21 at 13:33 -0500, Ben Myers wrote: > >> Retry bind for reserved source ports forever. Add an error message when we > >> have a hard time binding one. > > > > NACK. This approach leads to the process spinning forever in that loop, > > which is exactly why we introduced the limit in the first place. See all > > the old archived bug report emails about 'rpciod taking 100% cpu'. > > The root problem seems to be the hard loop. Thinking out loud, what if the client's FSM or some other higher up layer performed the retry, with a short delay inserted after each attempt? The problem isn't only the hard loop. The reason why we return the EADDRINUSE is in order to allow quick failure of mounts and/or automounts when we can't bind the socket. I suggest 2 changes: 1. In case of error, pass the return value from xs_bind to the pending tasks 2. Add a handler for EADDRINUSE in call_status(), xprt_connect_status() and call_connect_status(). Make sure that call_status() and call_connect_status() fail for SOFTCONN tasks, and that they print an error message, delay and retry in the case of ordinary hard tasks. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html