Re: [PATCH] RDMA/core: Add wait/retry version of ibnl_unicast

Chien Tin Tung <chien.tin.tung@xxxxxxxxx> · Thu, 29 Jun 2017 14:31:46 -0500

On Thu, Jun 29, 2017 at 07:37:19PM +0300, Leon Romanovsky wrote:
> On Thu, Jun 29, 2017 at 10:02:49AM -0500, Chien Tin Tung wrote:
> > On Thu, Jun 29, 2017 at 08:04:36AM +0300, Leon Romanovsky wrote:
> > > On Wed, Jun 28, 2017 at 03:30:03PM -0500, Chien Tin Tung wrote:
> > > > On Wed, Jun 28, 2017 at 06:36:39PM +0300, Leon Romanovsky wrote:
> > > > > On Wed, Jun 28, 2017 at 09:12:11AM -0500, Chien Tin Tung wrote:
> > > > > > On Wed, Jun 28, 2017 at 09:02:45AM -0500, Mustafa Ismail wrote:
> > > > > > > Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait,
> > > > > > > and modify ibnl_unicast to not wait/retry.  This eliminates
> > > > > > > the undesirable wait for future users of ibnl_unicast.
> > > > > > >
> > > > > > > Change Portmapper calls originating from kernel to user-space
> > > > > > > to use ibnl_unicast_wait and take advantage of the wait/retry
> > > > > > > logic in netlink_unicast.
> > > > > > >
> > > > > > > Signed-off-by: Mustafa Ismail <mustafa.ismail@xxxxxxxxx>
> > > > > > > Signed-off-by: Chien Tin Tung <chien.tin.tung@xxxxxxxxx>
> > > > > > > ---
> > > > > > >  drivers/infiniband/core/iwpm_msg.c |  6 +++---
> > > > > > >  drivers/infiniband/core/netlink.c  | 12 +++++++++++-
> > > > > > >  include/rdma/rdma_netlink.h        | 10 ++++++++++
> > > > > > >  3 files changed, 24 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > Please apply this patch instead of Leon's patch to revert
> > > > > > "IB/core: Add flow control to the portmapper netlink calls".
> > > > > >
> > > > > > Leon, we can work out names and parameters if this works for you.
> > > > >
> > > > > Chien,
> > > > >
> > > > > The names are less my worries with this patch. First of all, it misleads
> > > > > by using wait/retry naming, because it blocks and not waits.
> > > >
> > > > Nope.  It does a single shot retry and waits in a waitqueue.
> > > > Go look at netlink_unicast and in turn netlink_attachskb.  If you still
> > > > disagree, please flag specific code where it blocks.
> > >
> > > I agree, it wouldn't block in your scenario. However will it work in more hostile
> > > environments?
> >
> > The reason behind the original patch was to make it more tolerable to "hostile"
> > environments by giving it one more try allocate buffer to send the message.
> > So yes, it does work in more hostile environment.
> 
> Are you really calling extra load as a "hostile environment"?

I'm not playing 20 questions (https://en.wikipedia.org/wiki/Twenty_Questions).
If you have something in mind say so.  If you are referring to your malicious
user attack scenario see my response below.

> > > For example, malicious user can open RDMA netlink socket directly (socket(...)),
> > > set sndtimeo to be MAX_SCHEDULE_TIMEOUT - 1 (LONG_MAX - 1)
> >
> > How do you open the _kernel_ RDMA netlink socket directly from user-space and
> > set the timeout value?
> 
> Exactly as libnl, libmnl and rdma-core do. The netlink socket is
> accessible from the user space without root access. It is the whole
> purpose of it.

Wow, you just uncovered a big security hole.  Isn't this beyond a simple timeout
value?  Wouldn't the ability to set the socket to _blocking_ blow your top?

> > > and send custom
> > > netlink messages right to your new _wait function. If I understand correctly
> > > from the code, it will add them to waitqueue and won't release skb till
> > > the end of processing.
> >
> > The only caller of ibnl_unicast_wait is portmapper kernel client.  The call is
> > initiated from kernel not in response to user message so you did not understand
> > the code correctly.
> 
> Maybe and maybe you didn't read your code at all.

I'm gonna give you a pass on this one.

> Calls to _unicast are triggered from kernel for LS only, in you case as
> an example the request to get RDMA_NL_IWPM_MAPINFO will trigger ibnl_unicast_wait.

LS?  I don't have your acronym database in my head...

With Mustafa's patch, this is the flow with RDMA_NL_IWPM_MAPINFO
from user-space into the kernel:

iwpm_mapping_info_cb()
	iwpm_send_mapinfo()
		send_mapinfo_num()
		send_nlmsg_done()

Both send_mapinfo_num and sen_nlmsg_done call ibnl_unicast()
not ibnl_unicast_wait().

I know it is late where you are.  So let's talk about the spirit of the patch.
Anything that is in response to user-space should use ibnl_unicast.  If you
find something that's using ibnl_unicast_wait, then the patch is wrong
and will be corrected.  Now, is there actually one?  Please include call
flow if you find one.

> > > Will it cause to mark whole netlink socket as NETLINK_S_CONGESTED?
> >
> > No.  It has nothing to do with the new function or relevant to this discussion.
> 
> Again, read the source, the netlink socket overrun will set that flag.

Sorry, this is still beyond me.  What's the connection to wait/retry?
How is using wait/retry impacting setting of this flag?
Are you back to your malicious user attack scenario?  See response below.

> > > Will other users will be able to progress with their messages or they
> > > will need to wait till those _wait calls finish?
> >
> > Short answer is yes.  With this patch other users of ibnl_unicast would not
> > get the retry/wait.  If you choose to use the wait version, obviousely you will
> > have retry/wait.  For the work you are doing, you should stick with ibnl_unicast
> > and there's no impact to your code.  This is the reason for the two functions as
> > proposed originally.
> 
> Again, read the code and my proposed scenario. User without root
> permissions can open netlink socket (I'm doing it in rdmatool) and can
> send any message he wants. He can and will overflow the calls with
> ibnl_unicast_wait, so no other user will have resources available.

So, in the spirit of the patch, this cannot happen.  End of story.
Ignoring rest of your email.

> > > > > The second, I disagree with solution in kernel for user space application which can't
> > > > > handle the netlink errors.
> > > >
> > > > There is no guarantee delivery nor blocking on send.  Like I mentioned above,
> > > > it is a 1 shot retry with a set wait time.  The code obviousely handles error
> > > > condition as it can happen.
> > >
> > > So, can you please refresh our memory and explain again what exactly
> > > this patch is fixing if user-space handles errors correctly?
> >
> > You assertion of user-space flaw is incorrect.  It is based on incorrect
> > conclusion of deadlock and blocking.
> >
> > There is no deadlock and no blocking, period.
> 
> That question doesn't assume anything. I'm still asking what exactly are
> you fixing?
> >
> >
> > So what's the issue with taking this patch instead of the revert?
> 
> It doesn't fix anything broken in kernel.
> 
> >
> > Chien
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html