On 08/03/2015 08:24 AM, Christoph Hellwig wrote:
On Fri, Jul 31, 2015 at 03:20:40PM -0700, Bart Van Assche wrote:
SRP login fails with this patch series applied on top of Linux kernel
v4.2-rc4. At the target side the following message appears every time the
SRP initiator tries to log in: "ib_srpt: RDMA t 5 for idx 0 failed with
status 10" (10=remote access error). That causes the initiator to receive a
flush error (5).
Could this be caused because srp_map_sg_entry falls back to the phys
mapping for unaligned large requests in line 1389 in Dougs tree:
if ((!dev->use_fast_reg && dma_addr & ~dev->mr_page_mask) ||
dma_len > dev->mr_max_size) {
ret = srp_finish_mapping(state, ch);
if (ret)
return ret;
srp_map_desc(state, dma_addr, dma_len, target->rkey);
srp_map_update_start(state, NULL, 0, 0);
return 0;
}
This morning I added WARN_ON_ONCE(!target->rkey) statements in the code
paths that use target->rkey and that revealed that the above code
was indeed the culprit. Changing the default registration method from
FMR into FR should be sufficient to avoid that the above code is
triggered for HCA's that support FR (e.g. ConnectX-3 and ConnectX-4).
For HCA's that support FMR but not FR, how about telling the block layer
that data for the SRP initiator driver should be aligned at a 4KB
boundary ? In srp_add_one() one can see that the finest granularity the
SRP initiator driver supports for memory registration is 4KB
(mr_page_shift = ...).
Bart, do you know what hardware this workaround is for?
I hope the HW vendors can comment on this. Sorry but I'm not sure which
HCA models and/or firmware versions do not support FMR mapping with a
non-zero offset.
> Also the SRP driver still falls back to phys registrations if the
> better MR methods fail, something that will stop working with
> Jasons patch. It's something that looks a little questionable
> and which other ULDs don't do either, so it would be good time
> to review this practice.
Agreed.
Additionally the SRP_DATA_DESC_INDIRECT case always uses the global
rkey, so whenever we hits this things are going to break.
Indeed ... I think avoiding passing the global rkey over the wire in
this case means registering the indirect table explicitly. Some SRP
target implementations (e.g. the upstream ib_srpt driver) do not support
partial indirect tables and fetch the indirect table from the SRP_CMD
information unit. This means that the upstream ib_srpt driver never uses
the rkey that is passed in the indirect data buffer descriptor. Although
the upstream SRP target driver would keep working fine if an invalid
rkey would be passed in the indirect data buffer header, I haven't found
any clause in the SRP specification that allows to do this even if the
entire indirect descriptor is passed in the SRP_CMD IU.
Bart.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html