Re: RDMA Read: Local protection error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/29/2016 09:24 AM, Chuck Lever wrote:
I've found some new behavior, recently, while testing the
v4.6-rc Linux NFS/RDMA client and server.

When certain kernel memory debugging CONFIG options are
enabled, 1MB NFS WRITEs can sometimes result in a
IB_WC_LOC_PROT_ERR. I usually turn on most of them because
I want to see any problems, so I'm not sure which option
in particular is exposing the issue.

When debugging is enabled on the server, and the underlying
device is using FRWR to register the sink buffer, an RDMA
Read occasionally completes with LOC_PROT_ERR.

When debugging is enabled on the client, and the underlying
device uses FRWR to register the target of an RDMA Read, an
ingress RDMA Read request sometimes gets a Syndrome 99
(REM_OP_ERR) acknowledgement, and a subsequent RDMA Receive
on the client completes with LOC_PROT_ERR.

I do not see this problem when kernel memory debugging is
disabled, or when the client is using FMR, or when the
server is using physical addresses to post its RDMA Read WRs,
or when wsize is 512KB or smaller.

I have not found any obvious problems with the client logic
that registers NFS WRITE buffers, nor the server logic that
constructs and posts RDMA Read WRs.

My next step is to bisect. But first, I was wondering if
this behavior might be related to the recent problems with
s/g lists seen with iSER/SRP? ie, is this a recognized
issue?

Hello Chuck,

A few days ago I observed similar behavior with the SRP protocol but only if I increase max_sect in /etc/srp_daemon.conf from the default to 4096. My setup was as follows:
* Kernel 4.6.0-rc5 at the initiator side.
* A whole bunch of kernel debugging options enabled at the initiator
  side.
* The following settings in /etc/modprobe.d/ib_srp.conf:
  options ib_srp cmd_sg_entries=255 register_always=1
* The following settings in /etc/srp_daemon.conf:
  a queue_size=128,max_cmd_per_lun=128,max_sect=4096
* Kernel 3.0.101 at the target side.
* Kernel debugging disabled at the target side.
* mlx4 driver at both sides.

Decreasing max_sge at the target side from 32 to 16 did not help. I have not yet had the time to analyze this further.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux