Re: [PATCH v2 00/12] IB: Replace safe uses for ib_get_dma_mr with pd->local_dma_lkey

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/03/2015 08:24 AM, Christoph Hellwig wrote:
On Fri, Jul 31, 2015 at 03:20:40PM -0700, Bart Van Assche wrote:
SRP login fails with this patch series applied on top of Linux kernel
v4.2-rc4. At the target side the following message appears every time the
SRP initiator tries to log in: "ib_srpt: RDMA t 5 for idx 0 failed with
status 10" (10=remote access error). That causes the initiator to receive a
flush error (5).

Could this be caused because srp_map_sg_entry falls back to the phys
mapping for unaligned large requests in line 1389 in Dougs tree:

	if ((!dev->use_fast_reg && dma_addr & ~dev->mr_page_mask) ||
	    dma_len > dev->mr_max_size) {
		ret = srp_finish_mapping(state, ch);
		if (ret)
			return ret;

		srp_map_desc(state, dma_addr, dma_len, target->rkey);
		srp_map_update_start(state, NULL, 0, 0);
		return 0;
	}

This morning I added WARN_ON_ONCE(!target->rkey) statements in the code
paths that use target->rkey and that revealed that the above code
was indeed the culprit. Changing the default registration method from FMR into FR should be sufficient to avoid that the above code is triggered for HCA's that support FR (e.g. ConnectX-3 and ConnectX-4). For HCA's that support FMR but not FR, how about telling the block layer that data for the SRP initiator driver should be aligned at a 4KB boundary ? In srp_add_one() one can see that the finest granularity the SRP initiator driver supports for memory registration is 4KB (mr_page_shift = ...).

Bart, do you know what hardware this workaround is for?

I hope the HW vendors can comment on this. Sorry but I'm not sure which HCA models and/or firmware versions do not support FMR mapping with a non-zero offset.

> Also the SRP driver still falls back to phys registrations if the
> better MR methods fail, something that will stop working with
> Jasons patch.  It's something that looks a little questionable
> and which other ULDs don't do either, so it would be good time
> to review this practice.

Agreed.

Additionally the SRP_DATA_DESC_INDIRECT case always uses the global
rkey, so whenever we hits this things are going to break.

Indeed ... I think avoiding passing the global rkey over the wire in this case means registering the indirect table explicitly. Some SRP target implementations (e.g. the upstream ib_srpt driver) do not support partial indirect tables and fetch the indirect table from the SRP_CMD information unit. This means that the upstream ib_srpt driver never uses the rkey that is passed in the indirect data buffer descriptor. Although the upstream SRP target driver would keep working fine if an invalid rkey would be passed in the indirect data buffer header, I haven't found any clause in the SRP specification that allows to do this even if the entire indirect descriptor is passed in the SRP_CMD IU.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux