Re: rados bench seq throttling

"Deneau, Tom" <tom.deneau@xxxxxxx> · Wed, 16 Sep 2015 20:54:58 +0000

> -----Original Message-----
> From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx]
> Sent: Monday, September 14, 2015 5:32 PM
> To: Deneau, Tom
> Cc: ceph-users
> Subject: Re:  rados bench seq throttling
> 
> On Thu, Sep 10, 2015 at 1:02 PM, Deneau, Tom <tom.deneau@xxxxxxx> wrote:
> > Running 9.0.3 rados bench on a 9.0.3 cluster...
> > In the following experiments this cluster is only 2 osd nodes, 6 osds
> > each and a separate mon node (and a separate client running rados
> bench).
> >
> > I have two pools populated with 4M objects.  The pools are replicated
> > x2 with identical parameters.  The objects appear to be spread evenly
> across the 12 osds.
> >
> > In all cases I drop caches on all nodes before doing a rados bench seq
> test.
> > In all cases I run rados bench seq for identical times (30 seconds)
> > and in that time we do not run out of objects to read from the pool.
> >
> > I am seeing significant bandwidth differences between the following:
> >
> >    * running a single instance of rados bench reading from one pool with
> 32 threads
> >      (bandwidth approx 300)
> >
> >    * running two instances rados bench each reading from one of the two
> pools
> >      with 16 threads per instance (combined bandwidth approx. 450)
> >
> > I have already increased the following:
> >   objecter_inflight_op_bytes = 104857600000
> >   objecter_inflight_ops = 8192
> >   ms_dispatch_throttle_bytes = 1048576000  #didn't seem to have any
> > effect
> >
> > The disks and network are not reaching anywhere near 100% utilization
> >
> > What is the best way to diagnose what is throttling things in the one-
> instance case?
> 
> Pretty sure the rados bench main threads are just running into their
> limits. There's some work that Piotr (I think?) has been doing to make it
> more efficient if you want to browse the PRs, but I don't think they're
> even in a dev release yet.
> -Greg

Some further experiments with numbers of rados-bench clients:
   * All of the following are reading 4M sized objects with dropped caches as
     described above:
   * When we run multiple clients, they are run on different pools but from
     the same separate client node, which is not anywhere near CPU or network-limited
    * threads is the total across all clients, as is BW

Case 1: two node cluster, 3 osds on each node
total          BW      BW      BW 
threads       1 cli   2cli    4cli
-------       -----   ----    ----  
  4            174     185     194
  8            214     273     301
 16            198     309     399
 32            226     309     409
 64            246     341     421

Case 2: one node cluster, 6 osds on one node.
total          BW      BW      BW 
threads       1 cli   2cli    4cli
-------       -----   ----    ----  
  4           339      262     236
  8           465      426     383
 16           467      433     353
 32           470      432     339
 64           471      429     345

So, from the above data, having multiple clients definitely helps
in the 2-node case (Case 1) but hurts in the single-node case.
Still interested in any tools that would help analyze this more deeply...

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com