Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags

Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> · Thu, 9 Jul 2015 14:02:03 +0300

On 7/9/2015 2:36 AM, Jason Gunthorpe wrote:

I'm arguing upper layer protocols should never even see local memory
registration, that it is totally irrelevant to them. So yes, you can
call that a common approach to memory registration if you like..

Basically it appears there is nothing that NFS can do to optimize that
process that cannot be done in the driver/core equally effectively and
shared between all ULPs. If you see something otherwise, I'm really
interested to hear about it.

Even your case of the MR trade off for S/G list limitations - that is
a performance point NFS has no buisness choosing. The driver is best
placed to know when to switch between S/G lists, multiple RDMA READs
and MR. The trade off will shift depending on HW limits:
  - Old mthca hardware is probably better to use multiple RDMA READ
  - mlx4 is probably better to use FRMR
  - mlx5 is likely best with indirect MR
  - All of the above are likely best to exhaust the S/G list first

The same basic argument is true of WRITE, SEND and RECV. If the S/G
list is exhausted then the API should transparently build a local MR
to linearize the buffer, and the API should be designed so the core
code can do that without the ULP having to be involved in those
details.

Is it possible?

Jason

Jason,

We have protocol that involves remote memory keys transfer in their
standards so I don't see how we can remove it altogether from ULPs.

Putting that aside,

My main problem with this approach is that once you do non-trivial
things such as memory registration completely under the hood, it is
a slippery slope for device drivers.

If say a driver decides to register memory without the caller knowing,
it would need to post an extra work request on the send queue. So once
it sees the completion, it needs to silently consume it and have some
non trivial logic to invalidate it (another work request!) either from
poll_cq context or another thread.

Moreover, this also means that the driver needs to allocate bigger send
queues for possible future memory registration (depending on the IO
pattern maybe). And I really don't like an API that instructs the user
"please allocate some extra room in your send queue as I might need it".

This would also require the drivers to take a huristic approach on how
much memory registration resources are needed for all possible
consumers (ipoib, sdp, srp, iser, nfs, more...) which might have
different requirements.

I know that these are implementation details, but the point is that
vendor drivers can easily become a complete mess. I think we should
try to find a balanced approach where both consumers and providers are
not completely messed up.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html