Re: NFS over RDMA benchmark

Tom Talpey <tom@xxxxxxxxxx> · Tue, 30 Apr 2013 09:05:06 -0400

On 4/30/2013 1:09 AM, Yan Burman wrote:

-----Original Message-----
From: J. Bruce Fields [mailto:bfields@xxxxxxxxxxxx]
Sent: Sunday, April 28, 2013 17:43
To: Yan Burman
Cc: Wendy Cheng; Atchley, Scott; Tom Tucker; linux-rdma@xxxxxxxxxxxxxxx;
linux-nfs@xxxxxxxxxxxxxxx; Or Gerlitz
Subject: Re: NFS over RDMA benchmark

On Sun, Apr 28, 2013 at 06:28:16AM +0000, Yan Burman wrote:
On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman
<yanb@xxxxxxxxxxxx>
I've been trying to do some benchmarks for NFS over
RDMA and I seem to
only get about half of the bandwidth that the HW can give me.
My setup consists of 2 servers each with 16 cores,
32Gb of memory, and
Mellanox ConnectX3 QDR card over PCI-e gen3.
These servers are connected to a QDR IB switch. The
backing storage on
the server is tmpfs mounted with noatime.
I am running kernel 3.5.7.

When running ib_send_bw, I get 4.3-4.5 GB/sec for
block sizes 4-
512K.
When I run fio over rdma mounted nfs, I get
260-2200MB/sec for the
same block sizes (4-512K). running over IPoIB-CM, I get
200-
980MB/sec.
...
I am trying to get maximum performance from a single server
- I used 2
processes in fio test - more than 2 did not show any performance
boost.
I tried running fio from 2 different PCs on 2 different
files, but the sum of
the two is more or less the same as running from single client PC.

I finally got up to 4.1GB/sec bandwidth with RDMA (ipoib-CM bandwidth is also way higher now).
For some reason when I had intel IOMMU enabled, the performance dropped significantly.
I now get up to ~95K IOPS and 4.1GB/sec bandwidth.

Excellent, but is that 95K IOPS a typo? At 4KB, that's less than 400MBps.

What is the client CPU percentage you see under this workload, and
how different are the NFS/RDMA and NFS/IPoIB overheads?

Now I will take care of the issue that I am running only at 40Gbit/s instead of 56Gbit/s, but that is another unrelated problem (I suspect I have a cable issue).

This is still strange, since ib_send_bw with intel iommu enabled did get up to 4.5GB/sec, so why did intel iommu affect only nfs code?

You'll need to do more profiling to track that down. I would suspect
that ib_send_bw is using some sort of direct hardware access, bypassing
the IOMMU management and possibly performing no dynamic memory registration.

The NFS/RDMA code goes via the standard kernel DMA API, and correctly
registers/deregisters memory on a per-i/o basis in order to provide
storage data integrity. Perhaps there are overheads in the IOMMU
management which can be addressed.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html