RE: [EXPERIMENTAL v1 0/4] RDMA loopback device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ira,

> -----Original Message-----
> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma-
> owner@xxxxxxxxxxxxxxx> On Behalf Of Ira Weiny
> Sent: Thursday, February 28, 2019 4:16 PM
> To: Leon Romanovsky <leon@xxxxxxxxxx>
> Cc: Parav Pandit <parav@xxxxxxxxxxxx>; Dennis Dalessandro
> <dennis.dalessandro@xxxxxxxxx>; bvanassche@xxxxxxx; linux-
> rdma@xxxxxxxxxxxxxxx
> Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device
> 
> On Thu, Feb 28, 2019 at 09:38:53PM +0200, Leon Romanovsky wrote:
> > On Thu, Feb 28, 2019 at 02:06:53PM +0000, Parav Pandit wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx>
> > > > Sent: Thursday, February 28, 2019 6:39 AM
> > > > To: Parav Pandit <parav@xxxxxxxxxxxx>; Leon Romanovsky
> > > > <leon@xxxxxxxxxx>
> > > > Cc: bvanassche@xxxxxxx; linux-rdma@xxxxxxxxxxxxxxx
> > > > Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device
> > > >
> > > > On 2/27/2019 2:49 PM, Parav Pandit wrote:
> > > > >
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Leon Romanovsky <leon@xxxxxxxxxx>
> > > > >> Sent: Wednesday, February 27, 2019 1:56 AM
> > > > >> To: Parav Pandit <parav@xxxxxxxxxxxx>
> > > > >> Cc: bvanassche@xxxxxxx; linux-rdma@xxxxxxxxxxxxxxx
> > > > >> Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device
> > > > >>
> > > > >> On Wed, Feb 27, 2019 at 12:27:13AM -0600, Parav Pandit wrote:
> > > > >>> This patchset adds RDMA loopback driver.
> > > > >>> Initially for RoCE which works on lo netdevice.
> > > > >>>
> > > > >>> It is tested with with nvme fabrics over ext4, perftests, and rping.
> > > > >>> It only supports RC and GSI QPs.
> > > > >>> It supports only RoCEv2 GIDs which belongs to loopback lo
> netdevice.
> > > > >>>
> > > > >>> It is only posted for discussion [1].
> > > > >>> It is not yet ready for RFC posting or merge.
> > > > >>
> > > > >> Which type of discussion do you expect?
> > > > > Continuation of [1].
> > > > >> And can you give brief explanation why wasn't enough to extend
> rxe/siw?
> > > > >>
> > > > > Adding lo netdev to rxe is certainly an option along with cma
> > > > > patch in this
> > > > series.
> > > > >
> > > > > qp state machine is around spin locks..
> > > > > pools doesn't use xarray that loopback uses and siw intends to use.
> > > > >
> > > > > Incidentally, 5.0.0.rc5 rxe crashes on registering memory.
> > > > > Didn't have
> > > > inspiration to supply a patch.
> > > >
> > > > If rxe crashes we may want to fix it rather than creating a whole new
> driver.
> > > >
> > > > > However rxe as it stands today after several fixes from many is
> > > > > still not
> > > > there.
> > > > > It leaks consumer index to user space and not sure its effect of
> > > > > it. Jason did
> > > > talk some of the security concern I don't recall.
> > > > > A while back when I reviewed the code, saw things that might crash
> kernel.
> > > >
> > > > > Users complain of memory leaks, rnr retries dropping connections..
> > > >
> > > > If rxe is so broken, and there is no interest in fixing it, why do
> > > > we still have it? Should we just excise it from the tree?
> > > >
> > > > > Giving low priority to most of them, I think desire to have
> > > > > loopback rdma
> > > > device are below.
> > > > > 1. rxe is not ready for adding IB link types and large code
> > > > > restructure to
> > > > avoid skb processing in it. Pretty large rewrite to skip skbs.
> > > > > 2. stability and reasonable performance 3. maintainability
> > > >
> > > > I don't see how this is more maintainable. We are adding a new
> > > > driver, a new user space provider. So I don't see that as being a
> > > > reason for adding this.
> > > A new user space provider is less complex at cost of system calls.
> > > However it reuses most kernel pieces present today. User space driver is
> just a wrapper to ibv_cmd().
> > > I see this approach as start on right foot with this approach by not
> writing new code but use existing infra.
> > > And all 3 drivers (rxe, siw, loopback) reuse common user space driver,
> reuse resource allocator, and plugin their transport callbacks.
> > > Or siw should modify the rxe instead of creating those pieces now.
> > >
> > > >
> > > > > But if you think rxe is solid, siw should refactor the rxe code
> > > > > and start
> > > > using most pieces from there, split into library for roce and iw.
> > > > > Once that layering is done, may be loopback can fit that as
> > > > > different L4 so
> > > > that rxe uses skb, siw uses sockets, loopback uses memcpy.
> > > >
> > > > This is why rxe should have used rdmavt from the beginning and we
> > > > would pretty much have such a library.
> > > >
> > > > > Loopback's helper.c is intended to share code with siw for table
> > > > > resources
> > > > as xarray.
> > > > > It also offers complete kernel level handling of data and
> > > > > control path
> > > > commands and published perf numbers.
> > > >
> > > > We can debate back and forth whether this needed to be included in
> > > > siw and rxe, or if it and the others should have used rdmavt.
> > > > However, I think this is different enough of an approach that it
> > > > does stand on its own and could in fact be a new driver.
> > > >
> > >
> > > > The fact that rxe is broken and no one seems to want to fix it
> > > > shouldn't be our reason though.
> > > Same reasoning applies to siw. It should refactor out the code such that
> new L4 piece can be fit in there.
> > > But we are not taking that direction, same reasoning applies to similar
> other driver too.
> >
> > We didn't deeply review SIW yet, everything before was more coding
> > style bikeshedding. If you think that SIW and RXE need to be changed,
> > feel free to share your opinion more loudly.
> 
> I have not really looked at SIW yet either but it seems like there would be a
> lot of similarities to rxe which would be nice to consolidate especially at the
> higher layers.
Yes. qp state machine, mr state, uapi with rdma-core, resource id management (qp, mr, srq) etc all is large piece of common code.
rxe's skb, and siw sockets handling needs to plugin as transport level callbacks in a common driver.

>  To be fair rdmavt had a lot of special things because of the
> way the hfi1/qib hardware put packets on the wire _not_ using nor wanting
> something like an skb for example.
> 
> My gut says that SIW and rxe are going to be similar, and different from
> hfi1/qib (rdmavt) so I'm not sure trying to combine them will be worth the
> effort.
> 
> As to this "loopback" device I'm skeptical.  SIW and rxe have use cases to
> allow for interoperability/testing.
>
I agree that it would be nice to see rxe inter-operate with connectx5 100Gbps and match the line rate without dropping a connection for an hour.
I have my doubts with my experience with rxe and users complain about dropping connections in single system (forget other side as connectx5).
Last week I tried to do this with 5.0.0-rc5 and memory registration on rxe driver crashed the system.

Does any one know if UNH has done any interoperability tests with it?

I will check with our internal QA team, if they do more than touch tests between cx5 and rxe. :-)
 
> What is the real use case for this?
Use case is fairly simple. Group of developers and users are running VMs on laptop and in cloud without hw HCAs for devel purposes and automated unit and Jenkins tests of their application.
Such tests run for few hundreds of QPs and intent to exchange million messages.
This is mainly RoCE users.
rxe is not fitting these needs with its current state.

Bart was looking to run on loopback (similar to other user request I received) here [1].

So I am looking at something insanely simple but extendible who can find more uses as we go forward.
null_blk driver started simple, but stability lead to more features, which grew for 3/4 module parameters to a configfs based options now.

I am considering if rdma subsystem can offer such stable 'lo netdev' or 'null block device' style rdma device (starting with RoCE, followed by IB link), would it be more useful.
Apart from user devel and appl test needs, loopback driver is useful to develop regression tests for rdma-subsystem.

Chuck or Bart's ULP experience on how they like it would be interesting to listen to.
I used rxe for developing nvme fabrics dissector and it was good enough for few thousands of packets a year back.
There has been several fixes by well-wishers.

Bart's problem can be partly solvable, using patch [2] + adding lo device using new netlink rxe command from Steve, didn't try it yet.

[1] https://marc.info/?l=linux-rdma&m=155122131404449&w=2
[2] https://patchwork.kernel.org/patch/10831261/




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux