On Sun, Apr 28, 2013 at 10:42:48AM -0400, J. Bruce Fields wrote: > On Sun, Apr 28, 2013 at 06:28:16AM +0000, Yan Burman wrote: > > > > > > > > >> On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman > > > > > > > > >> <yanb@xxxxxxxxxxxx> > > > > > > > > >>> I've been trying to do some benchmarks for NFS over RDMA > > > > > > > > >>> and I seem to > > > > > > > > only get about half of the bandwidth that the HW can give me. > > > > > > > > >>> My setup consists of 2 servers each with 16 cores, 32Gb of > > > > > > > > >>> memory, and > > > > > > > > Mellanox ConnectX3 QDR card over PCI-e gen3. > > > > > > > > >>> These servers are connected to a QDR IB switch. The > > > > > > > > >>> backing storage on > > > > > > > > the server is tmpfs mounted with noatime. > > > > > > > > >>> I am running kernel 3.5.7. > > > > > > > > >>> > > > > > > > > >>> When running ib_send_bw, I get 4.3-4.5 GB/sec for block sizes 4- > > > 512K. > > > > > > > > >>> When I run fio over rdma mounted nfs, I get 260-2200MB/sec > > > > > > > > >>> for the > > > > > > > > same block sizes (4-512K). running over IPoIB-CM, I get 200- > > > 980MB/sec. > ... > > > > > > > I am trying to get maximum performance from a single server - I > > > > > > > used 2 > > > > > > processes in fio test - more than 2 did not show any performance boost. > > > > > > > I tried running fio from 2 different PCs on 2 different files, > > > > > > > but the sum of > > > > > > the two is more or less the same as running from single client PC. > > > > > > > > > > > > > > What I did see is that server is sweating a lot more than the > > > > > > > clients and > > > > > > more than that, it has 1 core (CPU5) in 100% softirq tasklet: > > > > > > > cat /proc/softirqs > ... > > > > > Perf top for the CPU with high tasklet count gives: > > > > > > > > > > samples pcnt RIP function DSO > ... > > > > > 2787.00 24.1% ffffffff81062a00 mutex_spin_on_owner > > > /root/vmlinux > ... > > > Googling around.... I think we want: > > > > > > perf record -a --call-graph > > > (give it a chance to collect some samples, then ^C) > > > perf report --call-graph --stdio > > > > > > > Sorry it took me a while to get perf to show the call trace (did not enable frame pointers in kernel and struggled with perf options...), but what I get is: > > 36.18% nfsd [kernel.kallsyms] [k] mutex_spin_on_owner > > | > > --- mutex_spin_on_owner > > | > > |--99.99%-- __mutex_lock_slowpath > > | mutex_lock > > | | > > | |--85.30%-- generic_file_aio_write > > That's the inode i_mutex. Looking at the code.... With CONFIG_MUTEX_SPIN_ON_OWNER it spins (instead of sleeping) as long as the lock owner's still running. So this is just a lot of contention on the i_mutex, I guess. Not sure what to do aobut that. --b. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html