Re: Upcoming libibverbs release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/23/2016 7:51 PM, Doug Ledford wrote:
I'm preparing to make an official release of libibverbs soonish
(thinking by the end of next week).  Even though I took the timestamp v5
patchset, I know there were still comments from Jason on the v4 set that
weren't addressed in v5.  Those can be addressed (and should be) with
incremental patches prior to the official release.  Yishai, can we
please prioritize getting that done?


Listed below the main openings that were raised by Jason and our input on, considering their need and priority.

Most of already answered as part of the discussion, will summarize it all together to help evaluating the status/next steps.


> I'm still very much of the opinion that extended CQs should only allow
> compatible QP's to be added.
--------------------------------------

Application may create a CQ with large subset of flags that may fit some of its attached QP types. Semantically, it is similar to what we have today, it allows to put different QP types on the same CQ and by thus allow application to have ordered completion events.

We don't see a good reason to change from current behavior and hurt applications that expect that valid behavior.

> Add a compatibility layer
-------------------------------

A real application wants to achieve performance, to use a compatibility layer over a vendor that doesn’t support the new API, might be a very bad idea as of below.

1) The legacy API doesn't support the batching API that was introduced by the new API, this will lead to have a Door-Bell/PCI write per CQE as internally the compatibility layer will need to call ibv_poll_cq with size of 1.

2) There will 2 copies, one from the HW CQE to ibv_wc as part of the internal legacy API, second, from the ibv_wc to the read_xxx output, comparing the new API.

3) The *non-required* fields will be copied anyway as part of the internal legacy ibv_poll_cq from the HW CQE to ibv_wc and hits the performance.

In addition,
There is *no* one to one mapping in all the cases between the new API to the legacy one, it will lead to a *non* transparent usage of the compatibility layer and ends by returning an error to the application in few cases.

Specifically,
One of major benefits in the new API is the ability to say as part of CQ creation that it's going to be used by only one thread and have lock less flow as part of polling the CQ. This is not supported in the legacy API which assumes that multi-threaded access might be used.

It makes sense to return an error in the compatibility layer API letting application knows that this service is not supported when it was asked by the application.

Applications that look for performance expect to work in that single threaded mode and might fail as part of using the compatibility layer or alternatively have no idea that they didn't get the service which assumes to be bad behavior.

In addition, at the moment timestamping can’t be read via the legacy API means that the compatibility layer can’t handle it and should supply an error for. In the future there may be other fields/flows that can’t be supported by the compatibility layer so application that will be written over this layer will fail and won’t get the required benefit from using it.

To sum it up:
We don't see a real usage of such a layer, it has many limitations.
In case will be a real request from users, it can always be added in the future, minor priority for now.

> Exact set of access flags
------------------------------

Application can ask for subset of the available bits, in addition, it has an option to ask for all legacy ones (i.e. IBV_WC_STANDARD_FLAGS) for a case that a CQ needs to support all legacy attribute and could be attached to both RC and UD QPs.

We didn't find it useful at the moment to add other enums per QP type as application can say explicitly what it wants by using the ORed bits and even per QP type application may not be always interested in all the fields. (e.g. CQ that used for TX only completions in RC is not interested in few fields that are relevant for the RX).

To sum it up:
Low priority issue that can be added in the future based on users demand.

> Exact set of accessors
--------------------------------

At the moment the API introduces an option to get each field by a separate accessor (i.e ibv_read_wc_xxx), in addition, wr_id and status have direct access as application expect to use them in most of their flows.

During the discussion come up some ideas to add some extra accessors per QP type (e.g. ibv_wc_read_ud, ibv_wc_read_rc, etc.).

This option makes sense when application really plans to read and use all of the potential UD/RC fields, which *can't* be assumed as true in the general case.

Even in a UD/RC QP the application may be interested in few of the CQE fields while others are not needed.

From bench mark testing, we found that gathering few fields in one ibv_wc_read_xxx has low benefit comparing the case that each field is read by itself.

In a case that *not* all fields are really needed the extra copy to return the redundant fields found to be more expensive comparing the ibv_wc_read_xxx when needed.

We can think of an application that may be interested in subset of fields per QP type or per send/receive flows to be gathered in one call, this ability can always be added in the future as the API cleanly
enables extensions.

If you still think that adding some extra ibv_wc_read_xxx groups from day one is really needed, let's define it and add now by an incremental patch. Till now didn't see any user specific request for.

> Minor coding issues and micro optimizations
------------------------------------------------

Not sure what exactly that point refers to from API point of view, assuming that it relates to few comments given on libmlx5 as of below.

> +static inline uint32_t mlx5_cq_read_wc_qp_num(struct ibv_cq_ex *ibcq)
> +{
> +	struct mlx5_cq *cq = to_mcq(ibv_cq_ex_to_cq(ibcq));

> You might want to look at having the caller pass in 'cq' from a member
> in ibv_cq_ex. That way the caller can hold the cq in a register
> instead of constantly reloading it like this - I'm assuming to_mcq is
> not just an container_of??

The assumption is *incorrect*, the internal access from ibv_cq_ex to driver CQ is done via offsetof/container_of which is done in compile time, no need to have some change from the clean usage of the API to that mode.


> +static inline uint32_t mlx5_cq_read_wc_qp_num(struct ibv_cq_ex
> *ibcq)
> You need to go through everything you've written and make sure it is
> const-correct.
> You should also annotate with restrict.
> Doing this properly can increase performance by allowing the caller to
> avoid reloads.

From performance point of view we don't expect a change here, we followed the created assembly code and saw no difference even when it was annotated with 'restrict'.

In addition, please note the 'restrict' notation was only became standard as part of C99, in previous compilers it may not work at all or requires different key word to be defined as part of the CONFIG step.

We can go ahead and define the ibv_wc_read_xxx with const qualifier on its input struct ibv_cq_ex *ibcq in some extra incremental patch if makes sense.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux