Hi Christoph, > From: Christoph Hellwig <hch@xxxxxxxxxxxxx> > Sent: Thursday, September 24, 2020 11:24 AM > > On Wed, Sep 23, 2020 at 03:35:56PM -0300, Jason Gunthorpe wrote: > > On Wed, Sep 23, 2020 at 06:39:55AM +0100, Christoph Hellwig wrote: > > > > I wonder if dma_virt_ops ignores the masking.. Still seems best to > > > > set it consistently when using dma_virt_ops. > > > > > > dma_virt_ops doesn't look at the mask. But in retrospective > > > dma_virt_ops has been a really bad idea. I'd much rather move the > > > handling of non-physical devices back into the RDMA core in the long > > > run. > > > > Doesn't that mean we need to put all the ugly IB DMA ops wrappers back > > though? > > All the wrappers still exists, and as far a I can tell are used by the core and > ULPs properly. > > > Why was dma_virt_ops such a bad idea? > > Because it isn't DMA, and not we need crappy workarounds like the one in > the PCIe P2P code. It also enforces the same size for dma_addr_t and a > pointer, which is not true in various cases. More importantly it forces a very > strange design in the IB APIs (actually it seems the other way around - the > weird IB Verbs APIs forced this decision, but it cements it in). For example > most modern Mellanox cards can include immediate data in the command > capsule (sorry NVMe terms, I'm pretty sure you guys use something else) > that is just copied into the BAR using MMIO. But the IB API design still forces > the ULP to dma map it, which is idiotic. For 64 B nvme command & for 16 nvme cqe, IB_SEND_INLINE flag can be used while posting send WR. Here the data is inline in the WQE at ib_sge as VA. This inline data doesn't require dma mapping. It is memcpyied to MMIO area and in the SQ ring (for differed access). Code is located at drivers/infiniband/hw/mlx5/wr.c -> set_data_inl_seg() To make use of it, ib_create_qp(qp_init_attr->cap.max_inline_data = 64) should be set. However little more plumbing is needed across vendors for nvme ULP to consume in proper way. For example, drivers/infiniband/hw/i40iw/i40iw_verbs.c trims the inline data to its internal limit of 48, while ULP might have asked for 64 during QP creation time. Post_send() will fail later when command is posted, which is obviously not expected. In another example ConnectX3 mlx4 driver doesn't support it. A while back I posted the patch [1] for ConnectX3 to avoid kernel crash and to support IB_SEND_INLINE. [1] https://patchwork.kernel.org/patch/9619537/#20240807