Re: kernel memory registration (was: RDMA/core: Transport-independent access flags)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 10, 2015 at 11:55:29AM +0300, Sagi Grimberg wrote:

> IMHO, memory registration is memory registration. The fact that we are
> distinguishing between local and remote might be a sign that this might
> be a wrong direction to take. Sorry.

I belive they are very different, yes, if you sit at the level of the
specification or maybe even a HCA design, they are nice and similar,
but for someone writing a ULP - local is not helping.

> What if a ULP has a pre-allocated pool of large buffers that it knows
> it is going to use for its entire lifetime? silent driver driven FRWRs
> would perform a lot worse than pre-registering these buffers.

Okay, so I think you've gone too far down the path here. I am
proposing an API direction that hides the lkey entirely and provides
common posting path for ULPs that needs dynamic short term on the fly
MR creation. This is basically every ULP in the kernel today.

I'm not saying we should rip out all of the lkey stuff and hobble
ib_post_send, only that this new API, aimed *specifically* at
simplifying our *current* universe of ULPs should do that.

> Or what if the ULP wants to register the memory region with data
> integrity (signature) parameters?

Then it calls rdma_post_write_dif(..)/etc. Is that not good enough?

> If there is one thing worse than a complicated API, it is a restrictive
> one. I'd much rather ULPs just having a simple API for registering
> memory.

No, strongly disagree. A restrictive API that solves exactly the
problem our ULPs today face is *exactly* what we need here.

This is in-kernel. It isn't a UAPI. It isn't an industry standard. We
can change and revise it next year if we need.

> >I'm not really seeing anything here that screams out this is
> >impossible, or performance is impacted, or it is too messy on either
> >the ULP or driver side.
> 
> I think it is possible (at the moment). But I don't know if we should
> have the drivers abusing the send/completion queues like that.
> 
> I can't say I'm fully on board with the idea of silent send-queue
> posting and silent completion consuming.

I'm not sure I'd call it silent, it tells the ULP how many slots it
will use.

> >I expect all these calls would be function pointers, and each driver
> >would provide a function pointer that is optimal for it's use. Eg mlx4
> >would provide a pointer that used the S/G list, then falls back to
> >FRMR if the S/G list is exhausted. The core code would provide a
> >toolbox of common functions the drivers can use here.
> 
> Maybe it's just me, but I can't help but wander if this is facilitating
> an atmosphere where drivers will keep finding new ways to abuse even
> the most simple operations.

Maybe, I'm not sure. Being restrictive with the API certainly prevents
alot of 'creative' uses. It is hard to argue about what
rdma_post_rdma_read should do, it can be made quite narrowly defined.

If mlx4 implements this with a FRMR call and mlx5 uses Indirect MR,
and qib implements it with a non-standard extended scatter list - do I
care? Not really.

But it does seem provide a much saner way for vendors to add
extensions, ie I think I'd rather see a rdma_post_write_dif than a
bunch of non-standard extensions in FRMR flags and WR attributes.

> I need more time to comprehend.

Please think about it, I'm pretty sure the iWarp guys *have* to go
down this road, it is a good way for them to implement their
quirk on RDMA READ across many ULPs.

Understand the iWarp problem is that they cannot use a phys dma MR for
their RDMA READ lkey - this is a major difference from IB.

The issue is larger than just memory registration.

> My intention is to improve FRWR API and gradually remove the other APIs
> from the kernel (i.e. FMR/FMR_POOL/MW). As I said, I don't think that
> striving to an API that implicitly chooses how to register memory is a
> good idea.

Can you explain why? And I mean specifically - how will
NFS/ISER/SRP/Lustre specifically be impacted if we move that choice
into the core/driver layer?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux