Re: [PATCH V1 libibverbs 1/8] Add ibv_poll_cq_ex verb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2 Mar 2016, Jason Gunthorpe wrote:

> So all this ugly API to minimize cache line usage has no measured
> performance gain?

We have seen an increased cacheline footprint adding ~100-200ns to receive
latencies during loops. This does not show up in synthetic loads that do
not do much processing since their cache footprint is minimal.

> > Does the opaque pointer guarantees an aligned access? Who allocates the
> > space for the vendor's CQE? Any concrete example?
> > One of the problems here are that CQEs could be formatted as -
> > "if QP type is y, then copy the field z from o". Doing that this way may
> > result doing the same "if" multiple times. The current approach could still
> > avoid memcpy and write straight to the user's buffer.
>
> No, none of that...
>
> struct ibv_cq
> {
>     int (*read_next_cq)(ibv_cq *cq,struct common_wr *res);
>     int (*read_address)(ibv_cq *cq,struct wr_address *res);
>     uint64_t (*read_timestamp(ibv_cq *cq);
>     uint32_t (*read_immediate_data(ibv_cq *cq);
>     void (*read_something_else)(ibv_cq *cq,struct something_else *res);
> };

Argh. You are requiring multiple indirect function calls
top retrieve the same imformation and therefore significantly increase
latency. This is going to cause lots of problems for procesing at high
speed where we have to use the processor caches as carefully as possible
to squeeze out all we can get.

> 4) A basic analysis says this trades cache line dirtying of the wc
>    array for unconditional branches.
>    It  eliminates at least 1 conditional branch per CQE iteration by
>    using only 1 loop.

This done none of that stuff at all if the device directly follows the
programmed format. There will be no need to do any driver formatting at
all.

>    Compared to the poll_ex proposal this eliminates a huge
>    number of conditional branches as 'wc_flags' and related no longer
>    exist at all.

wc_flags may be something bothersome. You do not want to check inside the
loop. All cqe's should come with the fields requested and the
layout of the data must be fixed when in the receive loop. No additional
branches in the loop.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux