Re: NFS over RDMA benchmark

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Tue, 23 Apr 2013 17:06:07 -0400

On Thu, Apr 18, 2013 at 12:47:09PM +0000, Yan Burman wrote:
> 
> 
> > -----Original Message-----
> > From: Wendy Cheng [mailto:s.wendy.cheng@xxxxxxxxx]
> > Sent: Wednesday, April 17, 2013 21:06
> > To: Atchley, Scott
> > Cc: Yan Burman; J. Bruce Fields; Tom Tucker; linux-rdma@xxxxxxxxxxxxxxx;
> > linux-nfs@xxxxxxxxxxxxxxx
> > Subject: Re: NFS over RDMA benchmark
> > 
> > On Wed, Apr 17, 2013 at 10:32 AM, Atchley, Scott <atchleyes@xxxxxxxx>
> > wrote:
> > > On Apr 17, 2013, at 1:15 PM, Wendy Cheng <s.wendy.cheng@xxxxxxxxx>
> > wrote:
> > >
> > >> On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman <yanb@xxxxxxxxxxxx>
> > wrote:
> > >>> Hi.
> > >>>
> > >>> I've been trying to do some benchmarks for NFS over RDMA and I seem to
> > only get about half of the bandwidth that the HW can give me.
> > >>> My setup consists of 2 servers each with 16 cores, 32Gb of memory, and
> > Mellanox ConnectX3 QDR card over PCI-e gen3.
> > >>> These servers are connected to a QDR IB switch. The backing storage on
> > the server is tmpfs mounted with noatime.
> > >>> I am running kernel 3.5.7.
> > >>>
> > >>> When running ib_send_bw, I get 4.3-4.5 GB/sec for block sizes 4-512K.
> > >>> When I run fio over rdma mounted nfs, I get 260-2200MB/sec for the
> > same block sizes (4-512K). running over IPoIB-CM, I get 200-980MB/sec.
> > >
> > > Yan,
> > >
> > > Are you trying to optimize single client performance or server performance
> > with multiple clients?
> > >
> 
> I am trying to get maximum performance from a single server - I used 2 processes in fio test - more than 2 did not show any performance boost.
> I tried running fio from 2 different PCs on 2 different files, but the sum of the two is more or less the same as running from single client PC.
> 
> What I did see is that server is sweating a lot more than the clients and more than that, it has 1 core (CPU5) in 100% softirq tasklet:
> cat /proc/softirqs

Would any profiling help figure out which code it's spending time in?
(E.g. something simple as "perf top" might have useful output.)

--b.

>                     CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14      CPU15
>           HI:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
>        TIMER:     418767      46596      43515      44547      50099      34815      40634      40337      39551      93442      73733      42631      42509      41592      40351      61793
>       NET_TX:      28719        309       1421       1294       1730       1243        832        937         11         44         41         20         26         19         15         29
>       NET_RX:     612070         19         22         21          6        235          3          2          9          6         17         16         20         13         16         10
>        BLOCK:       5941          0          0          0          0          0          0          0        519        259       1238        272        253        174        215       2618
> BLOCK_IOPOLL:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
>      TASKLET:         28          1          1          1          1    1540653          1          1         29          1          1          1          1          1          1          2
>        SCHED:     364965      26547      16807      18403      22919       8678      14358      14091      16981      64903      47141      18517      19179      18036      17037      38261
>      HRTIMER:         13          0          1          1          0          0          0          0          0          0          0          0          1          1          0          1
>          RCU:     945823     841546     715281     892762     823564      42663     863063     841622     333577     389013     393501     239103     221524     258159     313426     234030
> > >
> > >> Remember there are always gaps between wire speed (that ib_send_bw
> > >> measures) and real world applications.
> 
> I realize that, but I don't expect the difference to be more than twice.
> 
> > >>
> > >> That being said, does your server use default export (sync) option ?
> > >> Export the share with "async" option can bring you closer to wire
> > >> speed. However, the practice (async) is generally not recommended in
> > >> a real production system - as it can cause data integrity issues, e.g.
> > >> you have more chances to lose data when the boxes crash.
> 
> I am running with async export option, but that should not matter too much, since my backing storage is tmpfs mounted with noatime.
> 
> > >>
> > >> -- Wendy
> > >
> > >
> > > Wendy,
> > >
> > > It has a been a few years since I looked at RPCRDMA, but I seem to
> > remember that RPCs were limited to 32KB which means that you have to
> > pipeline them to get linerate. In addition to requiring pipelining, the
> > argument from the authors was that the goal was to maximize server
> > performance and not single client performance.
> > >
> 
> What I see is that performance increases almost linearly up to block size 256K and falls a little at block size 512K
> 
> > > Scott
> > >
> > 
> > That (client count) brings up a good point ...
> > 
> > FIO is really not a good benchmark for NFS. Does anyone have SPECsfs
> > numbers on NFS over RDMA to share ?
> > 
> > -- Wendy
> 
> What do you suggest for benchmarking NFS?
> 
> Yan
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html