> -----Original Message----- > From: Wendy Cheng [mailto:s.wendy.cheng@xxxxxxxxx] > Sent: Monday, April 29, 2013 08:35 > To: J. Bruce Fields > Cc: Yan Burman; Atchley, Scott; Tom Tucker; linux-rdma@xxxxxxxxxxxxxxx; > linux-nfs@xxxxxxxxxxxxxxx; Or Gerlitz > Subject: Re: NFS over RDMA benchmark > > On Sun, Apr 28, 2013 at 7:42 AM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > >> On Wed, Apr 17, 2013 at 7:36 AM, Yan Burman > > >> When running ib_send_bw, I get 4.3-4.5 GB/sec for block sizes 4-512K. > >> When I run fio over rdma mounted nfs, I get 260-2200MB/sec for the > > > same block sizes (4-512K). running over IPoIB-CM, I get 200-980MB/sec. > > ... > > [snip] > > >> 36.18% nfsd [kernel.kallsyms] [k] mutex_spin_on_owner > > > > That's the inode i_mutex. > > > >> 14.70%-- svc_send > > > > That's the xpt_mutex (ensuring rpc replies aren't interleaved). > > > >> > >> 9.63% nfsd [kernel.kallsyms] [k] _raw_spin_lock_irqsave > >> > > > > And that (and __free_iova below) looks like iova_rbtree_lock. > > > > > > Let's revisit your command: > > "FIO arguments: --rw=randread --bs=4k --numjobs=2 --iodepth=128 -- > ioengine=libaio --size=100000k --prioclass=1 --prio=0 --cpumask=255 > --loops=25 --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 -- > norandommap --group_reporting --exitall --buffered=0" > I tried block sizes from 4-512K. 4K does not give 2.2GB bandwidth - optimal bandwidth is achieved around 128-256K block size > * inode's i_mutex: > If increasing process/file count didn't help, maybe increase "iodepth" > (say 512 ?) could offset the i_mutex overhead a little bit ? > I tried with different iodepth parameters, but found no improvement above iodepth 128. > * xpt_mutex: > (no idea) > > * iova_rbtree_lock > DMA mapping fragmentation ? I have not studied whether NFS-RDMA > routines such as "svc_rdma_sendto()" could do better but maybe sequential > IO (instead of "randread") could help ? Bigger block size (instead of 4K) can > help ? > I am trying to simulate real load (more or less), that is the reason I use randread. Anyhow, read does not result in better performance. It's probably because backing storage is tmpfs... Yan -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html