Just as an FYI I went back and tested mlx4 srp as my customer mentioned on the older arrays he was able to set max_sectors_kb=4096. The claim was that on mlx4 they did not get into the sg map failure issue when issuing 4MB buffered I/O to a file system backed by the SRP targets. This made no sense to me because I know the issues we addressed in Bart's patch set were in ib_srp. I tested without Bart's latest SRP patch set (where we have the throttle back of the max_sectors_kb to avoid the issue) and I see the same sg map failures on mlx4 using using FDR not EDR. So bottom line, with larger max_sectors_kb we will get into this issue on both mlx4 and mlx5 and for now we cannot sustain 4MB for max_sectors_kb with buffered I/O, Laurence Oberman Principal Software Maintenance Engineer Red Hat Global Support Services ----- Original Message ----- From: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx> To: "Chuck Lever" <chuck.lever@xxxxxxxxxx>, "linux-rdma" <linux-rdma@xxxxxxxxxxxxxxx> Sent: Friday, April 29, 2016 12:45:00 PM Subject: Re: RDMA Read: Local protection error On 04/29/2016 09:24 AM, Chuck Lever wrote: > I've found some new behavior, recently, while testing the > v4.6-rc Linux NFS/RDMA client and server. > > When certain kernel memory debugging CONFIG options are > enabled, 1MB NFS WRITEs can sometimes result in a > IB_WC_LOC_PROT_ERR. I usually turn on most of them because > I want to see any problems, so I'm not sure which option > in particular is exposing the issue. > > When debugging is enabled on the server, and the underlying > device is using FRWR to register the sink buffer, an RDMA > Read occasionally completes with LOC_PROT_ERR. > > When debugging is enabled on the client, and the underlying > device uses FRWR to register the target of an RDMA Read, an > ingress RDMA Read request sometimes gets a Syndrome 99 > (REM_OP_ERR) acknowledgement, and a subsequent RDMA Receive > on the client completes with LOC_PROT_ERR. > > I do not see this problem when kernel memory debugging is > disabled, or when the client is using FMR, or when the > server is using physical addresses to post its RDMA Read WRs, > or when wsize is 512KB or smaller. > > I have not found any obvious problems with the client logic > that registers NFS WRITE buffers, nor the server logic that > constructs and posts RDMA Read WRs. > > My next step is to bisect. But first, I was wondering if > this behavior might be related to the recent problems with > s/g lists seen with iSER/SRP? ie, is this a recognized > issue? Hello Chuck, A few days ago I observed similar behavior with the SRP protocol but only if I increase max_sect in /etc/srp_daemon.conf from the default to 4096. My setup was as follows: * Kernel 4.6.0-rc5 at the initiator side. * A whole bunch of kernel debugging options enabled at the initiator side. * The following settings in /etc/modprobe.d/ib_srp.conf: options ib_srp cmd_sg_entries=255 register_always=1 * The following settings in /etc/srp_daemon.conf: a queue_size=128,max_cmd_per_lun=128,max_sect=4096 * Kernel 3.0.101 at the target side. * Kernel debugging disabled at the target side. * mlx4 driver at both sides. Decreasing max_sge at the target side from 32 to 16 did not help. I have not yet had the time to analyze this further. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html