Re: NFS over RDMA benchmark

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/24/2013 2:04 PM, Wendy Cheng wrote:
On Wed, Apr 24, 2013 at 9:27 AM, Wendy Cheng <s.wendy.cheng@xxxxxxxxx> wrote:
On Wed, Apr 24, 2013 at 8:26 AM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
On Wed, Apr 24, 2013 at 11:05:40AM -0400, J. Bruce Fields wrote:
On Wed, Apr 24, 2013 at 12:35:03PM +0000, Yan Burman wrote:



Perf top for the CPU with high tasklet count gives:

              samples  pcnt         RIP        function                    DSO
              _______ _____ ________________ ___________________________ ___________________________________________________________________

              2787.00 24.1% ffffffff81062a00 mutex_spin_on_owner         /root/vmlinux

I guess that means lots of contention on some mutex?  If only we knew
which one.... perf should also be able to collect stack statistics, I
forget how.

Googling around....  I think we want:

         perf record -a --call-graph
         (give it a chance to collect some samples, then ^C)
         perf report --call-graph --stdio


I have not looked at NFS RDMA (and 3.x kernel) source yet. But see
that "rb_prev" up in the #7 spot ? Do we have Red Black tree somewhere
in the paths ? Trees like that requires extensive lockings.


So I did a quick read on sunrpc/xprtrdma source (based on OFA 1.5.4.1
tar ball) ... Here is a random thought (not related to the rb tree
comment).....

The inflight packet count seems to be controlled by
xprt_rdma_slot_table_entries that is currently hard-coded as
RPCRDMA_DEF_SLOT_TABLE (32) (?).  I'm wondering whether it could help
with the bandwidth number if we pump it up, say 64 instead ? Not sure
whether FMR pool size needs to get adjusted accordingly though.

1)

The client slot count is not hard-coded, it can easily be changed by
writing a value to /proc and initiating a new mount. But I doubt that
increasing the slot table will improve performance much, unless this is
a small-random-read, and spindle-limited workload.

2)

The observation appears to be that the bandwidth is server CPU limited.
Increasing the load offered by the client probably won't move the needle,
until that's addressed.



In short, if anyone has benchmark setup handy, bumping up the slot
table size as the following might be interesting:

--- ofa_kernel-1.5.4.1.orig/include/linux/sunrpc/xprtrdma.h
2013-03-21 09:19:36.233006570 -0700
+++ ofa_kernel-1.5.4.1/include/linux/sunrpc/xprtrdma.h  2013-04-24
10:52:20.934781304 -0700
@@ -59,7 +59,7 @@
   * a single chunk type per message is supported currently.
   */
  #define RPCRDMA_MIN_SLOT_TABLE (2U)
-#define RPCRDMA_DEF_SLOT_TABLE (32U)
+#define RPCRDMA_DEF_SLOT_TABLE (64U)
  #define RPCRDMA_MAX_SLOT_TABLE (256U)

  #define RPCRDMA_DEF_INLINE  (1024)     /* default inline max */

-- Wendy
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux