Re: Confused about BUG_ON in rpcb_getport_async

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday August 12, trond.myklebust@xxxxxxxxxx wrote:
> On Tue, 2008-08-12 at 12:27 +1000, Neil Brown wrote:
> > Hi 
> >  I have a report of a the BUG_ON in rpcb_getport_clnt being
> >  triggered.  This is:
> > 	/* Autobind on cloned rpc clients is discouraged */
> > 	BUG_ON(clnt->cl_parent != clnt);
> > 
> >  It looks to me that while they might be discouraged, they are not
> >  prevented and so having the BUG_ON is wrong.
> > 
> >  When rpc_clone_client creates a clone, it sets cl_autobind to 0,
> >  and gives the new client a reference to the same cl_xprt as the
> >  original client.
> > 
> >  The only effect of cl_autobind is to prevent rpc_force_rebind from
> >  clearing the BOUND flag on ->cl_xprt.
> >  So while a call to rpc_force_rebind on the clone will not clear that
> >  flag, a call on the original client will clear that flag.
> > 
> >  So a cloned client can still end up with a ->cl_xprt with the BOUND
> >  flag clear.
> > 
> >  So call_bind (which is present in the call trace under the oops) can
> >  find that !xprt_bound, even when the client is a cloned client.
> > 
> >  When this happens, ->rpcbind, which is rpcb_getport_clnt, goes BOOM.
> > 
> >  What should happen when a clone client finds that its transport is no
> >  longer bound?  Should rpc_getport_async just do
> >     clnt = task->tk_client->cl_parent;
> >  ??
> 
> This shouldn't be a particularly urgent problem, since lockd should
> normally be the only thing that needs to use the autobind functionality,
> and it does not use cloned clients.
> 
> That said, I think that the answer is that cloned clients may indeed use
> rpcbind as long as they share the same program number and version as the
> parent.
> IOW: we should probably rather BUG_ON() calls to rpc_bind_new_program()
> if the parent has xprt->cl_autobind set.

Just to confirm,  you are saying that the current BUG_ON is indeed
wrong, and the fact that it fires does not imply any real bug.
It should be removed.  Possibly a BUG_ON could be put somewhere else
to catch incorrect behaviour.

Is that correct?

I now have multiple people for whom this BUG_ON has fired.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux