RE: [PATCH rdma-next] RDMA: Explicitly pass in the dma_device to ib_register_device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christoph,

> From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> Sent: Thursday, September 24, 2020 11:24 AM
> 
> On Wed, Sep 23, 2020 at 03:35:56PM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 23, 2020 at 06:39:55AM +0100, Christoph Hellwig wrote:
> > > > I wonder if dma_virt_ops ignores the masking.. Still seems best to
> > > > set it consistently when using dma_virt_ops.
> > >
> > > dma_virt_ops doesn't look at the mask.  But in retrospective
> > > dma_virt_ops has been a really bad idea.  I'd much rather move the
> > > handling of non-physical devices back into the RDMA core in the long
> > > run.
> >
> > Doesn't that mean we need to put all the ugly IB DMA ops wrappers back
> > though?
> 
> All the wrappers still exists, and as far a I can tell are used by the core and
> ULPs properly.
> 
> > Why was dma_virt_ops such a bad idea?
> 
> Because it isn't DMA, and not we need crappy workarounds like the one in
> the PCIe P2P code.  It also enforces the same size for dma_addr_t and a
> pointer, which is not true in various cases.  More importantly it forces a very
> strange design in the IB APIs (actually it seems the other way around - the
> weird IB Verbs APIs forced this decision, but it cements it in).  For example
> most modern Mellanox cards can include immediate data in the command
> capsule (sorry NVMe terms, I'm pretty sure you guys use something else)
> that is just copied into the BAR using MMIO.  But the IB API design still forces
> the ULP to dma map it, which is idiotic.

For 64 B nvme command & for 16 nvme cqe, IB_SEND_INLINE flag can be used while posting send WR.
Here the data is inline in the WQE at ib_sge as VA.

This inline data doesn't require dma mapping.
It is memcpyied to MMIO area and in the SQ ring (for differed access).

Code is located at drivers/infiniband/hw/mlx5/wr.c -> set_data_inl_seg()

To make use of it, ib_create_qp(qp_init_attr->cap.max_inline_data = 64) should be set.

However little more plumbing is needed across vendors for nvme ULP to consume in proper way.
For example,
drivers/infiniband/hw/i40iw/i40iw_verbs.c trims the inline data to its internal limit of 48, while ULP might have asked for 64 during QP creation time.
Post_send() will fail later when command is posted, which is obviously not expected.

In another example ConnectX3 mlx4 driver doesn't support it.
A while back I posted the patch [1] for ConnectX3 to avoid kernel crash and to support IB_SEND_INLINE.

[1] https://patchwork.kernel.org/patch/9619537/#20240807




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux