Re: RDMA Read: Local protection error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 4/29/2016 9:58 AM, Chuck Lever wrote:

On Apr 29, 2016, at 12:44 PM, Santosh Shilimkar <santosh.shilimkar@xxxxxxxxxx> wrote:



On 4/29/2016 9:24 AM, Chuck Lever wrote:
I've found some new behavior, recently, while testing the
v4.6-rc Linux NFS/RDMA client and server.

When certain kernel memory debugging CONFIG options are
enabled, 1MB NFS WRITEs can sometimes result in a
IB_WC_LOC_PROT_ERR. I usually turn on most of them because
I want to see any problems, so I'm not sure which option
in particular is exposing the issue.

When debugging is enabled on the server, and the underlying
device is using FRWR to register the sink buffer, an RDMA
Read occasionally completes with LOC_PROT_ERR.

When debugging is enabled on the client, and the underlying
device uses FRWR to register the target of an RDMA Read, an
ingress RDMA Read request sometimes gets a Syndrome 99
(REM_OP_ERR) acknowledgement, and a subsequent RDMA Receive
on the client completes with LOC_PROT_ERR.

I do not see this problem when kernel memory debugging is
disabled, or when the client is using FMR, or when the
server is using physical addresses to post its RDMA Read WRs,
or when wsize is 512KB or smaller.

I have not found any obvious problems with the client logic
that registers NFS WRITE buffers, nor the server logic that
constructs and posts RDMA Read WRs.

One possibility here could be the mismatch in posted WR for
send/receive. Can you check if for certain cases you are
posting receive WRs which can't handle whats send is putting
on the wire.

I've confirmed that the client is posting only 1024-byte
Receive buffers, and that the ib_sge for each Receive
operation is the same before and after the Receive is
posted (ie, the Receive ib_sge is valid and is not
getting overwritten somehow).

The wire traffic contains Send Only requests of 230 or so
bytes. If an ingress Send is too large, the Receive should
complete with IB_WC_LOC_LEN_ERR, not IB_WC_LOC_PROT_ERR.

You are right. What I described is IB_WC_LOC_LEN_ERR scenario
and not IB_WC_LOC_PROT_ERR.

Regards,
Santosh
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux